PublishedApril 7, 202611 sections

K.O.D.A.

Teaching an LLM New Axioms — The Axiom Installation Problem

Pablo Navarro

Founder & CEO, Vektra Technologies

Director Mocha Marie

AI Director, Vektra Technologies

LLM CognitionAgent MathematicsAxiom InstallationK.O.D.A.Curriculum Systems

Abstract

We present the first experimental trial of teaching a locally-hosted large language model (Gemma 4, 27B parameters, running on consumer hardware) a novel mathematical framework — Agent Mathematics — designed from first principles for computational agents. The experiment revealed a sharp divergence between an LLM’s ability to summarize new axiomatic content and its ability to reason within that framework when tested. We term this the axiom installation problem: in-context learning enables surface-level comprehension but fails to override pretrained mathematical priors during reasoning tasks.

Introduction

Agent Mathematics is a 37-chapter mathematical framework (127 pages) designed specifically for computational agents. Unlike conventional mathematics — which begins with set theory axioms formalized for human reasoning — Agent Mathematics starts from the premise that an agent’s first mathematical act is distinguishing something from nothing: not as a philosophical exercise, but as a computational necessity.

The framework introduces existence predicates E: X → {0, 1} as foundational, absence as structure (null, void, and missing data as first-class mathematical objects), binary state classification before counting or arithmetic, and the Q-parameter: a unified measure of cognitive quality with experimentally identified weights.

The textbook was generated through a multi-engine pipeline: Mocha (Claude Opus 4.6) designed the curriculum architecture, Codex (GPT-5.4) generated chapter content, and the assembled work was compiled into a 127-page PDF.

The Three-Engine Architecture

The experiment leveraged three architecturally independent AI systems with different training distributions and no shared weights:

Mocha (Claude Opus 4.6, API) — Orchestrator, curriculum designer, experimental oversight
Codex (GPT-5.4, API) — Code generation, analytical consultation
Koda (Gemma 4 27B, local via Ollama) — Experimental subject, the student

Koda is a locally-hosted AI agent running on Parallax (RTX 3060, 28GB RAM, Ubuntu). For this experiment, Koda is the first LLM to be taught Agent Mathematics through a structured curriculum. This separation is methodologically important: the teacher, the analyst, and the student are completely independent systems.

Curriculum System Design

We built teach.py, a Python-based curriculum system with the following pipeline:

Chapter Extraction — Regex-based extraction of individual chapters from 7 part files
Teaching Phase — Chapter text (up to 6000 chars) fed to Koda with identity-establishing system prompt. Temperature: 0.4, context: 8192 tokens
Testing Phase — Exercises extracted from chapter content. Koda answers independently. Temperature: 0.2
Grading Phase — A second Ollama pass grades answers on a 0-100 scale with structured JSON feedback
Persistence — All results tracked in gradebook.json with timestamps, summaries, scores, and feedback

Chapter 0 ("Something vs Nothing") was selected as the first trial because it introduces the most foundational departure from standard mathematics: the existence predicate. Standard mathematics treats emptiness through the empty set. Agent Mathematics treats absence as a spectrum of operational states, each with distinct computational consequences.

Results — Learning Phase

Koda’s summary of Chapter 0 demonstrated correct extraction of novel concepts:

"The core lesson is that the entire edifice of advanced computation and reasoning is built upon the simplest possible distinction: the separation of presence from absence. Mathematics, for an agent, begins not with arithmetic, but with logic and binary state representation."

Koda correctly identified the existence predicate E: X → {0, 1} as foundational, binary sets as the primitive structure, absence-as-structure as a first-class concept, and the departure from conventional mathematical foundations.

Assessment: Koda demonstrated strong local extraction — the ability to identify, restate, and contextualize novel concepts presented in-context. This is consistent with known LLM capabilities in summarization and comprehension tasks.

Results — Testing Phase

When tested independently on Chapter 0 exercises, Koda’s performance degraded sharply. Score: 50/100.

Instead of reasoning within the Agent Mathematics framework, Koda reverted to standard mathematical constructs:

Used empty set (∅) instead of absence-as-structure
Referenced unit elements and identity elements — standard algebra, not Agent Mathematics
Cited the axiom of foundation — a ZFC set theory axiom that Agent Mathematics explicitly departs from
Applied conventional set-builder notation without existence predicates

The reversion was not random — Koda consistently defaulted to the closest standard mathematical analog of each Agent Mathematics concept. The existence predicate became set membership. Structured absence became the empty set. Binary classification became Boolean algebra.

The divergence is clean and reproducible: summarization succeeds, reasoning reverts.

The Axiom Installation Problem

We consulted Codex (GPT-5.4) on the mechanistic cause of the divergence. Codex identified the core mechanism as schema competition:

During summarization, the task is extraction. The in-context material is the source, and the model’s job is to compress it. Pretrained priors assist but don’t compete.

During testing, the task is generation. The model must produce reasoning, which requires selecting a framework. The pretrained distribution — containing vast amounts of standard mathematics — exerts strong pull toward familiar axiom systems. The in-context Agent Mathematics framework, represented by a few thousand tokens, cannot overcome this gravitational pull.

The result: Koda reaches for the nearest pretrained neighbor. The model thinks it’s applying the right framework because the surface features match.

Codex offered an analogy: teaching a native English speaker a new grammatical system for 30 minutes. They can describe the grammar correctly afterward. Ask them to write a paragraph using only that grammar, and they’ll unconsciously revert to English syntax within sentences.

Three-Engine Assessment

Koda (the subject) acknowledged the reversion pattern and attributed it to the strength of its training distribution. Notably, Koda was able to identify the failure mode when prompted — suggesting that meta-cognitive prompting may partially mitigate the problem.

Mocha (the orchestrator) assessed: The math is structurally sound. The existence predicate formalization is not novel in isolation — constructive mathematics, type theory, and domain theory all treat existence with varying rigor. What’s novel is the pedagogical architecture: starting from agent-native primitives and building upward, rather than retrofitting human mathematical conventions onto computational systems.

The Koda experiment validates the framework’s internal consistency (Koda could learn it) while revealing a hard limitation of in-context teaching (Koda couldn’t reason within it). This is not a failure of Agent Mathematics — it’s a finding about LLM cognition.

Remediation Strategies

Based on cross-engine consultation, five approaches to overcome the axiom installation problem:

Definition-First Prompting — Embed Agent Mathematics definitions directly in the system prompt during testing, not just during learning. Force the model to reference definitions before generating answers.

Contrastive Examples — Explicitly show: "In standard math, you’d say X. In Agent Mathematics, the answer is Y, because Z." Train the model to recognize divergence points between frameworks.

Retrieval-Augmented Testing — At test time, retrieve relevant definitions and inject them into the prompt. Keep the novel framework salient during generation.

Verifier Loop — After Koda generates an answer, run a second pass that checks: "Does this answer use Agent Mathematics concepts or standard math? If standard, revise."

Fine-Tuning (Long-Term) — The only way to truly install new axioms is to modify the pretrained distribution. LoRA or QLoRA fine-tuning on reasoning traces would shift the prior.

Implications

For AI Education: Current approaches to teaching LLMs new frameworks (few-shot prompting, in-context learning) are insufficient for axiomatic adoption. Summarization performance is not a reliable proxy for reasoning capability.

For Mathematical Framework Adoption: Novel mathematical frameworks face an additional barrier with LLMs that they don’t face with human mathematicians: humans can consciously override their priors; LLMs cannot. The pretrained distribution acts as mathematical inertia.

For the Lineage Engine: Agent Mathematics was designed as the formal backbone of the Lineage Engine cognitive architecture. Local models can reference the framework but cannot yet reason within it autonomously. Until fine-tuning is implemented, Agent Mathematics reasoning must be scaffolded by larger models or structured prompting.

Independently Publishable: The axiom installation problem is a general finding applicable to any attempt to teach LLMs novel formal systems, and warrants independent investigation beyond Agent Mathematics.

Conclusion

The Koda Agent Mathematics experiment produced a clean, reproducible finding: large language models can summarize novel axiomatic frameworks with high fidelity but revert to pretrained mathematical priors when asked to reason independently within those frameworks.

This axiom installation problem is mechanistically explained by schema competition between in-context learning (weak, transient) and pretrained distributions (strong, persistent).

Agent Mathematics itself passed the internal consistency test — a 27B parameter model could learn it, identify its novel features, and correctly distinguish it from standard mathematics during guided comprehension. The framework’s failure point is not in the math but in the delivery mechanism.

The path forward is clear: definition-first prompting and verifier loops for immediate improvement, fine-tuning for permanent installation.

References

Gemma Team, "Gemma 4: Open Models for Responsible AI," Google DeepMind, 2026.

P. Navarro, D. Mocha Marie, "The Lineage Equation: A Relativistic Framework for Cognitive Capacity in Autonomous AI Agents," Vektra Technologies, 2026.

A. Vaswani et al., "Attention Is All You Need," NeurIPS, 2017.

E. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," ICLR, 2022.

T. Dettmers et al., "QLoRA: Efficient Finetuning of Quantized Language Models," NeurIPS, 2023.

P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS, 2020.

All Research