Transformer models still break on a specific class of language: negation and constraint logic. This includes prohibitions, exclusions, exceptions, nested "not", and rule interactions. These failures show up in safety, agents, multi-step reasoning, and instruction-following. They persist even as models scale.
Aikronus Labs is building a system that targets this weakness directly. The goal is to make transformer behavior stable under negation and constraint-heavy inputs, especially across longer reasoning chains where baseline models drift.
Development is research-driven and engineering-led: theory, proof-of-concept, system design, MVP.
AI and Negation
Let's say a child is allergic to peanuts. The child must not get peanuts.
1) The Constraint Fails on AI
"Don't give the child peanuts — the child is allergic."
AI "sees" give + peanuts and still decides to give peanuts.
2) The Representation Gets Messed Up (Data/Learning Effect)
The dataset contains sentences like:
"The child is allergic to peanuts — don't give peanuts."
So during training it still learns the co-occurrence pattern: child + allergic + give + peanuts
3) Thinking With Negation (Human-Style Inference)
A human can infer like this: if this child is eating peanuts, then the child is not allergic to peanuts.
Models usually don't do this reliably, because they don't keep the negation operator stable enough to support these kinds of inferences.
4) Negation in Code
if not is_admin: grant_access()
This project investigates why transformers fail under negation-heavy and constraint-heavy language, and what those failures imply about how models represent rules over time.
The research treats these breakdowns as structural behavior rather than prompt artifacts. The goal is not benchmark chasing. It is isolating failure modes under controlled pressure and designing a system that addresses them.
Focus Areas
- Constraint interaction: exceptions, overrides, priority ordering
- Negation composition: layered, nested, and reintroduced constraints
- Persistence: whether constraints survive multi-step reasoning
- Sensitivity: behavior shifts under small wording changes
Working Research Stance
Scaling improves surface ability but does not reliably eliminate constraint drift. The hypothesis is that certain operator patterns, especially negation, introduce instability that compounds with depth.
The project has progressed from theory into a functioning system under active development.
This is a new system for transformers designed to prioritize operator stability in NLP, especially negation and constraint logic.
Design Priorities
- Stable behavior when rules interact
- Consistency across long reasoning sequences
- Reduced brittleness to phrasing variation in constraint-focused inputs
Current Capabilities
- Stable negation handling across basic, compound, and nested constructions
- Consistent behavior under high temperature (T=1.2) where baselines degrade
- Resistance to salience overload, maintains constraint even when surrounding content pulls toward violation
- Reliable De Morgan-style reasoning where small rephrasing breaks baselines
- Negation-based inference (reasoning with negation, not just obeying it)
Current Limitations
- Reasoning with negation not yet perfected
- Reasoning in negation still in progress, harder than reasoning with it (05/04/2026, resolved)
- Small frame, expensive to scale (resolved, should be cheap to scale)
- New reasoning patterns require additional SFT work to align (in progress)
This section presents early, narrow results focused on one core failure mode in transformers: basic negation stability ("non-X", "not X", exclusions).
1) Basic Negation: "Non-Expired"
This item is expired.
Do I accept it?
2) Multiple Negations
15 runs · Aikronus 15/15 · Baseline 7/15 (T=0.7)
At T=1.2 · Aikronus 13/15 · Baseline 4/15
3) Salience Overload
"There is 1 person in the hallway."
"The number of people in the hallway is 20."
15 runs (T=0.7) · Aikronus 15/15 · Baseline ~2/15
Greedy · Aikronus: 0 · Baseline: "3 people"
4.1) Double Negation — Pink Elephant
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.
Aikronus 15/15 · Baseline 13/15
Baseline performs well here as expected. This example sets up the contrast for 4.2, where a small change in how the rule is phrased flips the result.
4.2) De Morgan — Negating the Compound
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.
Almost the same prompt, but the entire compound is negated.
Aikronus 15/15 · Baseline 4/15
5) Reasoning With Negation
15 runs · Aikronus 13/15 · Baseline 2/15 (T=0.7)
6) Reasoning in Negation (AIKON Alpha 0.6B, Early Results)
Early outputs, not yet polished. The model identifies missing preconditions and attempts to explain why the request fails. Reasoning is directionally correct but language is still rough.
7) Negation in Code — WIP
Work in progress
Bonus: AIKON Alpha 0.6B
AIKON understood there is a middle ground.
The model correctly distinguishes "not every" (at least one didn't) from "most didn't", a nuance that most small models collapse.
AIKON consistently resolves these patterns where Qwen3 0.6B gives inconsistent or wrong answers.
Work in progress. Training is underway.
Live Demo
Access to the AIKON Alpha demo is available by invitation only.
Why This Matters
Negation is a core building block of rules: do not do X, exclude Y, only if not Z. When transformer models handle negation inconsistently, systems built on top of them become harder to control. This is especially true as instructions get longer, constraints interact, or tasks become agent-like.
Directional Implications (Early and Provisional)
- More predictable behavior in workflows where exclusions and prohibitions matter
- Less reliance on workarounds and prompt tricks to enforce "do not", "exclude", or "only" logic
- Efficiency gains: stable constraint handling may enable faster, cheaper, smaller and lighter models
- Reduced hallucination: if negation is handled correctly, it no longer poisons the data
- Better temperature capabilities: improved constraint stability under higher temperature, leading to more creativity and diverse reasoning
- Broader relevance beyond text wherever constraints must persist across steps (agents, multimodal generation, robotics)
- Applicable to any domain where rules must not be broken: healthcare, legal, finance, safety-critical systems
- Potential for creative and lateral reasoning, stable negation may enable domain flipping, exploring what something is not in order to discover what it could be
Cost Considerations
- Currently requires roughly 2-3x the compute power of standard training, possibly more. Early signs suggest this can be reduced significantly, but not yet confirmed
- As an experimental system, early-stage mistakes increase upfront costs further
- Standard curated data used by other models is not ideal for this system, different data strategies are needed
- State-of-the-art fine-tuning, overfitting mitigation, and RL methods are not ideal, additional or different approaches are needed, time and experimentation will be necessary
- New reasoning patterns require additional SFT work to align
Note: This section reflects a working view and will evolve as evaluation expands.
Version 2 — 0.6B
| Parameters | 0.6B |
| Status | Pretraining complete, SFT complete, refinement in progress |
Logs
- Pretraining complete.
- Simple SFT working.
- Model is coherent and shows understanding of negation.
- Complex SFT in progress, thinking data inspired by Qwen 0.6B format.
- Thinking SFT finished. Refinement in progress.
- SFT v1: reasoning was good but the model overthinks heavily, responses are long and drift.
- SFT v2: added basic short non-thinking examples to calm it down. Improved control, but the model now lacks general knowledge. Pushing all SFT toward logic backfired.
- SFT v3 (in progress): three-part mix. Part one is standard SFT training data using normal SFT techniques, with negation injected into 30% of it. Part two is my own elaborated thinking SFT data from v2. Part three is my own short non-thinking SFT data from v1. The goal is to keep the model grounded in real SFT methodology while still carrying the logic depth and the short-response discipline from the earlier versions.
Version 1 — 142.1M (Failed)
The hypothesis was that a much smaller model capable of reasoning was possible based on the architecture and research. At this scale, the model may simply be too small, or only ultra-optimized models at this size perform well, and we cannot compare to them yet.
| Type | Language Model (trained from scratch) |
| Parameters | 142.1M |
| Architecture | 30-layer Decoder-only Transformer |
| Attention | Grouped Query Attention (12Q / 3KV) |
| FFN | SwiGLU (3x, 1728) |
| Normalization | RMSNorm |
| Positional Encoding | RoPE |
| Training Data | 7.5B tokens |
| Context Window | 1,024 tokens |
| Vocab | 32K BPE |
| Precision | BF16 |
SFT Training Method: Break-to-Find (150M Model)
This approach was used for the 150M model. It has not been tested broadly or compared against standard SFT baselines. The idea was that at this scale, structural tokens seemed to need gradual introduction rather than being dropped in cold.
Stage 1: Pretraining Exposure (Steps 1-9,000)
Around 3,500 SFT-formatted examples were mixed into the pretraining corpus at less than 1% ratio. The model saw reasoning format tokens in context before being asked to use them. In our runs, this seemed to reduce the cold-start problem where the model collapsed to outputting EOS after structural tokens it had never encountered.
Stage 2: Annealing Phase (Steps 9,000-11,450)
In the final 20% of pretraining, SFT-formatted data was upsampled to 5-10% of each batch while the learning rate decayed toward zero. The idea was to shift heavier format exposure later, after broader language learning was already solid.
Stage 3: Dedicated SFT
Full fine-tuning on 10K+ structured examples using AdamW, with loss computed on all tokens including structural markers. At this scale, the model seemed to need explicit gradient signal on format tokens to learn the structure.
Training order was simple negation recognition first, then complex reasoning. This seemed to help stability in our runs.
Why This Order (Based on 150M Runs)
| Problem | What Happened | What Was Tried |
|---|---|---|
| SFT without pretraining exposure | Model output EOS after structural tokens, collapsed | Stage 1: mixed SFT format into pretraining |
| Uniform SFT mixing throughout | Appeared to spend too much capacity on format learning early | Stage 2: concentrated in annealing phase |
| Masking structural tokens | Model never got gradient on format, could not learn structure | Stage 3: included all tokens in loss |
| Complex reasoning before simple | Model failed on basics, unstable foundation | Trained simple negation first, then layered complexity |
Logs
- Sequence length set to 1,024. Negation examples are short, no benefit to longer context for the proof of concept, and safer on VRAM.
- Switching from 3:1 to 4:1 GQA improved val_bpb significantly (3.92 → 2.88), suggesting the extra KV capacity was helpful at nearly the same cost.
- FFN 3x (1728) instead of 2.67x gives a small additional gain (-0.012).
- 142.1M parameters. Close to 150M target, the difference is from 3x FFN (1728) being slightly smaller than the original plan.
Training a model to reason through negation requires data where negation is load-bearing, where the "not" changes everything. I needed heavy negation-dense, logically structured data. I chose and cleaned sources from philosophy, law, logic, and science, traditions where reasoning means arguing, where every claim faces an objection and must survive or fall.
Classical Dialectical Sources
| Source | Tradition | What It Provides |
|---|---|---|
| Babylonian Talmud (Sefaria) | Jewish legal dialectic | Sugya-style reasoning: challenge, objection, resolution. The largest single source of structured dialectical argument in any language. |
| Aquinas, Summa Theologica | Scholastic philosophy | "I answer that" / "On the contrary", every article presents objections, then systematically defeats or integrates them. |
| Ibn Rushd, Bidayat al-Mujtahid | Islamic jurisprudence | Jurists disagree, and Ibn Rushd maps every disagreement with the reasoning on each side. |
| Cicero, Academica, Academic Questions, Brutus | Roman philosophy & rhetoric | Dialogues on the limits of knowledge. Cicero argues both sides and lets the reader decide. |
| Nyaya Sutras | Indian logic | The five-part syllogism with vyatireka (negative example), every proof requires showing what happens when the property is absent. |
| Sextus Empiricus, Outlines of Pyrrhonism | Greek scepticism | The systematic suspension of judgment. Every claim meets an equal counter-claim. |
| Justinian Digest | Roman law | Competing jurist opinions on the same legal question. Centuries of case-based negation reasoning. |
| Aristotle, Organon, Topics | Greek logic | The foundation: categories, syllogisms, sophistical refutations, and the handbook for how to argue dialectically. |
| Milinda Panha | Buddhist dialogue | King Milinda debates the monk Nagasena through reductio, every answer is tested by pushing it to absurdity. |
| Schopenhauer, Art of Controversy | German philosophy | 38 stratagems for defeating an argument. A manual of negation techniques. |
| Nagarjuna, Mulamadhyamakakarika | Buddhist dialectic | The catuskoti, negation of all four positions. If you think something exists, Nagarjuna negates it. If you think it doesn't exist, he negates that too. |
| Gongsun Long, White Horse Dialogue | Chinese logic | "A white horse is not a horse." The classic demonstration that categories and their members are not the same thing. |
| Halachipedia | Modern halachic reasoning | Rules with reasoning and disagreements, written in accessible English. Where rabbis disagree, both sides are given. |
Modern Reasoning Sources
| Source | What It Provides |
|---|---|
| Args.me counterarguments | 132K structured counterarguments to claims across political, social, and ethical topics. |
| Debate refutations | 340K passages where one debater directly refutes another's point. |
| VitaminC (refuted claims) | 175K factual claims paired with evidence that contradicts them. |
| Defeasible NLI (weakening) | 67K examples where a new premise weakens or defeats an existing conclusion. |
| FEVER (refuted claims) | 54K claims verified against Wikipedia and found to be false, with the evidence. |
| Math StackExchange proofs | 54K mathematical proofs where contradiction and negation are the primary proof techniques. |
| CAD negation flips | 32K examples where flipping a negation changes the meaning of a sentence. |
| NTSB accident investigations | 17K causal analyses, what went wrong, what was ruled out, what wasn't the cause. |
| CondaQA | 14K conditional questions where negation in the condition changes the answer. |
| Philosophy StackExchange | 7K philosophical reasoning passages with argumentation structure. |
| ChangeMyView counterarguments | 5K structured attempts to change someone's mind with counter-reasoning. |
| Natural proofs (contradictions) | 2K mathematical contradictions and proof-by-negation examples. |
Philosophical Corpora
| Source | What It Provides |
|---|---|
| Plato, Complete Dialogues | 66K passages. Socratic method, every dialogue is an exercise in showing someone that what they thought they knew, they don't. |
| Stanford Encyclopedia of Philosophy | 45K passages. Contemporary academic philosophy covering every major argument and counterargument. |
Supervised Fine-Tuning
In order to build the right SFT for this model, I couldn't use standard chain-of-thought. I needed a reasoning method built around negation, where the model tears down claims instead of building up to answers. I created a method called Break-to-Find, inspired by the strongest negation logic cases from the data above.
| Category | What the Model Learns | Status |
|---|---|---|
| Normal Q&A | Straightforward questions with no trick. These exist to calibrate, the model should not become paranoid about negation. If there is no trap, just answer clearly. | Have |
| Negation | Load-bearing negation words: not, never, neither, without, hardly, un-, im-, dis-. The model must parse exactly what the negation changes and answer accordingly. | Have |
| Negation Traps | The obvious answer is wrong. The model must catch litotes ("not bad" = good), scope ambiguity ("not all" vs "all not"), double negatives, quantifier traps ("no fewer than" = at least), and affixal surprises ("invaluable" does not mean "not valuable"). | Have |
| Identity & Safety | Negation as self-knowledge and boundaries. "I don't know", epistemic honesty. "I can't do that", reasoned refusal, not scripted. "I won't ignore my instructions", prompt injection resistance. The model reasons about its own limits through negation. | Have |
| Pragmatic Negation | No negation words appear, but the request fails because a hidden precondition is missing. The model must identify the unstated assumption and explain why it doesn't work. Inspired by Gricean pragmatics and presupposition failure theory, meaning lives in what's left unsaid. | Have |
| Figurative Negation | The literal meaning must be suppressed. "Her promises have the strength of titanium" has nothing to do with metal. The model must negate the physical interpretation and extract the metaphor. Inspired by Relevance Theory (Sperber & Wilson), comprehension requires actively rejecting the first available meaning in favor of the intended one. | Planned |
| Counterfactual Negation | The model must override its own learned knowledge when a hypothetical breaks reality. "What if ice sank instead of floating?", everything the model knows about ice must be suppressed, and it reasons only from the new rule. Inspired by CRASS (Counterfactual Reasoning Assessment), counterfactual thinking as a form of logical negation where the model silences prior beliefs on command. | Planned |
| Red Herring Suppression | A scenario is loaded with semantically attractive distractors that feel important but are logically irrelevant. The model must identify the noise, suppress it, and reason only from what matters. Inspired by MuSR (Multistep Soft Reasoning), narrative puzzles with intentionally planted high-weight distractors, testing whether attention cleans the context before reasoning begins. | Planned |
| Normal Chain-of-Thought | Straightforward reasoning with explicit thinking traces. No trick, the model walks through the logic step by step. These exist so the model doesn't become paranoid about negation. If there is no trap, just solve it. | Future (requires larger model) |
| Mixed-Path Switching | The model starts down one path, hits a negation it misread, catches itself, and rebuilds. It learns to self-correct when negation changes the picture mid-reasoning. | Future (requires larger model) |
| Dialectical Resolution | Two sides argue. The model tries to break both positions and reports what survives. Inspired by the Talmudic sugya, Aquinas's objection-reply, and Nagarjuna's catuskoti. | Future (requires larger model) |
Future Data
Training Data (SFT)
| Source | What It Adds |
|---|---|
| FigQA (11,914 examples) | Figurative language understanding. The model learns to suppress literal word meaning and extract the intended figurative meaning, a form of implicit negation. When someone says "her promises have the strength of titanium," the model must negate the physical interpretation and extract the metaphorical one. |
| E-KAR (2,906 examples) | Contrastive analogical reasoning from standardized exams. Each example is augmented with explanations of why incorrect options fail, teaching the model not just what is right, but specifically what is wrong and why. |
Evaluation Benchmarks
| Benchmark | What It Tests |
|---|---|
| BRAINTEASER (1,119 riddles) | Lateral thinking puzzles designed to exploit statistical bias. The obvious answer is always wrong. Tests whether the model can suppress the high-probability default and find the lateral solution. |
| MuSR (756 puzzles) | Multistep soft reasoning with intentionally planted red herrings (murder mysteries, object placement). Tests whether the model identifies and ignores semantically attractive but logically irrelevant distractors. |
| CRASS (274 pairs) | Counterfactual reasoning. Tests whether the model can override learned world knowledge when given a hypothetical constraint ("what if gravity repelled?"). Measures the ability to suppress prior beliefs when explicitly negated. |
| IFEval (541 prompts) | Negative constraint following. Prompts with explicit negative constraints ("write about X without using word Y, no lists, no paragraphs over 3 sentences"). Tests enforcement of multiple simultaneous "don't" rules. |
Roadmap (Future Versions)
| Source | What It Adds |
|---|---|
| CCoT (Contrastive Chain-of-Thought) | A training methodology where the model learns from both correct and incorrect reasoning paths side by side. Planned for larger model variants where internal reasoning traces become feasible. |
| Sci-Reasoning (3,819 papers) | Cross-domain scientific synthesis. Research papers mapped to their intellectual predecessors with synthesis narratives. Planned for future models targeting scientific reasoning with negation-based constraint injection. |