Mathematician Agents

A mathematician agent is an agent whose skills include formal proof construction and whose tools include proof assistants (Lean, Agda, Coq, Isabelle). Within the agential semioverse framework, a mathematician agent is a triple (Sigma, K, Pi) where:

Sigma is bounded internal state (working memory of proof context, goals, hypotheses)
K is a set of interaction operators (proof tactics, term construction, type-checking)
Pi is a policy selecting admissible operators given the current state and universe

The key structural insight: the proof checker neutralizes hallucination. Unlike natural-language generation where errors propagate silently, a proof assistant rejects invalid reasoning immediately. This makes the agent loop fundamentally different from open-ended text generation — every proposed step is verified or rejected by the tool.

The agent loop

The core workflow of a mathematician agent:

Read the specification — understand what needs to be proved and why
State the theorem — write a type signature that precisely captures the claim
Propose proof steps — generate tactic applications or term constructions
Verify with the checker — lake build or agda --check provides ground truth
Interpret errors — compiler feedback is the primary learning signal
Iterate — fix errors, decompose goals, search for lemmas

This loop is a semiotic process: each proof attempt is a sign, the compiler’s response is an interpretant, and the agent’s revised attempt is the next sign in the chain. The trace comonad G records this provenance — each proved theorem carries the history of its construction.

Landscape of LLM + proof assistant systems

Lean 4 ecosystem (most active)

LeanDojo — gym-like environment for ML-driven proving; extracts training data from Mathlib4
ReProver — retrieval-augmented tactic generation; searches Mathlib for relevant premises
LeanCopilot — native LLM integration in the Lean editor; suggests tactics in real-time
DeepSeek-Prover-V2 — achieved 88.9% on miniF2F-test benchmark
LeanAide — autoformalization from natural language to Lean 4
leanprover/skills — official Lean 4 skills for Claude Code

Other proof assistants

LEGO-Prover (Isabelle) — grows a skill library of verified lemmas; each proved lemma becomes available for future proofs
Prover Agent — agent-based framework for formal proofs with reflection and iterative repair
AGDA2TRAIN (NeurIPS 2024) — first large-scale Agda dataset for ML, extracting intermediate proof states

Key findings from research

Agent architectures beat single-shot generation — decomposition, reflection, and iterative repair are standard practice
Bottom-up construction works best — prove helper lemmas first, main theorem last (LEGO-Prover validates this)
The compiler feedback loop is the core mechanism — lake build → read errors → fix → retry
Bite-sized goals — fill one sorry at a time, compile after each
Benchmark progress is dramatic — miniF2F went from ~50% to ~99% in roughly two years

Mathematician agents in the agential semioverse

Within the agential semioverse framework, a mathematician agent’s proof-writing skills are admissible behaviors in the fusion-theoretic sense:

Fragment-preserving: proofs operate within bounded contexts (fragments); a proof about Heyting algebras does not require understanding all of mathematics
Modal-compatible: proved theorems are habitual — once verified, they settle into stable knowledge (j(theorem) = theorem)
Trace-compatible: proofs carry provenance of their construction; the chain of proof attempts, compiler errors, and fixes is itself semiotic data

The fusion principle applies: if two proof strategies agree on all fragments (produce the same theorems from the same hypotheses), they are congruent and may be identified. The agent’s skill library is the syntactic component; the verified theorems are the semantic component; fusion ensures coherence between them.

Constructive reasoning and AI

Most LLM training data uses classical mathematics. This creates a systematic bias: models tend to reach for proof by contradiction, the law of excluded middle, and case splits on P ∨ ¬P — none of which are available in intuitionistic logic.

For the semiotic universe formalization, this means:

The agent must be explicitly constrained to avoid classical axioms
#print axioms in Lean and the absence of LEM postulates in Agda serve as verification
The gap between a and ¬¬a in a Heyting algebra is not a deficiency — it represents genuine indeterminacy, meanings not yet determined by relational activity

This is an underserved area in the AI + formal verification landscape. The AGDA2TRAIN dataset (NeurIPS 2024) and “Learn from Failure” (arXiv 2024) represent early steps toward training models specifically for constructive reasoning.

Open questions

Can an agent learn constructive proof patterns from classical training data, guided only by the proof checker’s rejections?
How does the agent’s proof search strategy relate to the fragment structure of the semiotic universe?
Is there a natural notion of “proof complexity” that corresponds to the ordering on partial semiotic structures?
Can the fusion principle be used to identify equivalent proof strategies and simplify the agent’s skill library?

References

Formal Mathematical Reasoning: A New Frontier in AI — ICML 2025 position paper
AI will make formal verification go mainstream — Martin Kleppmann
AGDA2TRAIN — NeurIPS 2024
LeanDojo — Lean 4 ML environment
Survey on Deep Learning for Theorem Proving — COLM 2024

emsenn

Explorer

Mathematician Agents

Mathematician Agents

The agent loop

Landscape of LLM + proof assistant systems

Lean 4 ecosystem (most active)

Other proof assistants

Key findings from research

Mathematician agents in the agential semioverse

Constructive reasoning and AI

Open questions

References

Graph View

Table of Contents

Backlinks