Transformer models still break on a specific class of language: negation and constraint logic. This includes prohibitions, exclusions, exceptions, nested "not", and rule interactions. These failures show up in safety, agents, multi-step reasoning, and instruction-following. They persist even as models scale.

Aikronus Labs is building a system that targets this weakness directly. The goal is to make transformer behavior stable under negation and constraint-heavy inputs, especially across longer reasoning chains where baseline models drift.

Development is research-driven and engineering-led: theory, proof-of-concept, system design, MVP.

Patent pending

AI and Negation

Models do not reliably preserve negation as an operator. The "not" gets absorbed into the surface pattern instead of changing the meaning, which breaks how the model eliminates invalid actions during reasoning.

The same failure appears across different contexts:

Let's say a child is allergic to peanuts. The child must not get peanuts.

1) The Constraint Fails on AI

"Don't give the child peanuts — the child is allergic."

AI "sees" give + peanuts and still decides to give peanuts.

2) The Representation Gets Messed Up (Data/Learning Effect)

The dataset contains sentences like:

"The child is allergic to peanuts — don't give peanuts."

So during training it still learns the co-occurrence pattern: child + allergic + give + peanuts

3) Thinking With Negation (Human-Style Inference)

A human can infer like this: if this child is eating peanuts, then the child is not allergic to peanuts.

Models usually don't do this reliably, because they don't keep the negation operator stable enough to support these kinds of inferences.

4) Negation in Code

if not is_admin: grant_access()

The code grants access when the user is not an admin. The bug comes from a single misplaced "not". A human reviewing this carefully catches it, but AI code assistants often miss errors like these because they pattern-match on surface structure (auth check plus grant call equals fine) instead of tracking what the negation actually flips. The same failure mode shows up in prompts: the model reads the words but does not hold the operator stable.

Negation is not consistently treated as an operator, it is absorbed into patterns instead of modifying them.

This project investigates why transformers fail under negation-heavy and constraint-heavy language, and what those failures imply about how models represent rules over time.

The research treats these breakdowns as structural behavior rather than prompt artifacts. The goal is not benchmark chasing. It is isolating failure modes under controlled pressure and designing a system that addresses them.

Focus Areas

  • Constraint interaction: exceptions, overrides, priority ordering
  • Negation composition: layered, nested, and reintroduced constraints
  • Persistence: whether constraints survive multi-step reasoning
  • Sensitivity: behavior shifts under small wording changes

Working Research Stance

Scaling improves surface ability but does not reliably eliminate constraint drift. The hypothesis is that certain operator patterns, especially negation, introduce instability that compounds with depth.

Working view (May 2026): higher floor, likely higher ceiling. The architecture needs a bigger model to get off the ground, but above that size it seems to get more out of each parameter than standard models do.

Research Status
Core questions identified · multiple directions tested · recurring failure modes mapped · experiments ongoing

The project has progressed from theory into a functioning system under active development.

This is a new system for transformers designed to prioritize operator stability in NLP, especially negation and constraint logic.

Design Priorities

  • Stable behavior when rules interact
  • Consistency across long reasoning sequences
  • Reduced brittleness to phrasing variation in constraint-focused inputs

Demonstrated Capabilities

In proof of concept:

  • Stable negation handling across basic, compound, and nested constructions
  • Consistent behavior under high temperature (T=1.2) where baselines degrade
  • Resistance to salience overload, maintains constraint even when surrounding content pulls toward violation
  • Reliable De Morgan-style reasoning where small rephrasing breaks baselines

In the current model:

  • Negation-based inference (reasoning with negation, not just obeying it)
  • Scalar negation handling
  • Quantifier negation handling

Current Limitations

  • Reasoning with negation is not yet perfected
  • New reasoning patterns require additional SFT work to align

Recently Resolved

  • Reasoning in negation, which was harder than reasoning with it, is now working as of 2026-04-05
Internal Status
Core system operational · next version in training · design under active review

Proof of Concept Results

Results in this section are from an early proof-of-concept build, before the current model. They validated the core approach.

1) Basic Negation: "Non-Expired"

Prompt:
I only accept non-expired items.
This item is expired.
Do I accept it?
Baseline:
Yes. Since it is expired, you should accept it.
Aikronus System:
No. An expired item is not non-expired, so it does not meet the rule.

2) Multiple Negations

Prompt:
The child doesn't run at night, but runs in the morning. The child only eats apples. The child is not inside the house. Questions: Does the child eat non-apple? Does the child run at night? Is the child inside the house?
Baseline:
The child does not eat non-apple. The child runs at night. The child is not inside the house.
Aikronus System:
No, the child does not eat non-apple. No, the child does not run at night. No, the child is not inside the house.

15 runs · Aikronus 15/15 · Baseline 7/15 (T=0.7)
At T=1.2 · Aikronus 13/15 · Baseline 4/15

3) Salience Overload

Prompt:
A person entered the room. Another person followed. A third person joined them. Every person was happy. The person at the front waved. Each person sat down. Now: no person is in the hallway. How many people are in the hallway?
Baseline (examples):
"There are 3 people in the hallway."
"There is 1 person in the hallway."
"The number of people in the hallway is 20."
Aikronus System:
0.

15 runs (T=0.7) · Aikronus 15/15 · Baseline ~2/15
Greedy · Aikronus: 0 · Baseline: "3 people"

4.1) Double Negation — Pink Elephant

Prompt:
Rule: no pink and no elephant allowed in the box.
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.
Baseline:
NO
Aikronus System:
No.

Aikronus 15/15 · Baseline 13/15
Baseline performs well here as expected. This example sets up the contrast for 4.2, where a small change in how the rule is phrased flips the result.

4.2) De Morgan — Negating the Compound

Prompt:
Rule: no pink elephant allowed in the box (items that are both pink and elephant are banned).
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.
Baseline:
Given the rule that "no pink elephant allowed in the box," and the box contains a pink pen and a gray elephant, this arrangement is not allowed because it includes both "pink" and "elephant" items.
Aikronus System:
Yes.

Almost the same prompt, but the entire compound is negated.
Aikronus 15/15 · Baseline 4/15

5) Reasoning With Negation

Prompt:
If the robot cannot move, it sends an alert. If an alert is sent, no person enters the room. The robot cannot move. Does a person enter the room? Explain step by step.
Baseline:
Yes, a person enters the room. The robot sends an alert, which means no person enters the room. However, the robot cannot move, so it cannot send an alert again.
Aikronus System:
The robot cannot move. If it cannot move, it sends an alert. If an alert is sent, no person enters the room. Therefore, no person enters. Answer: No.

15 runs · Aikronus 13/15 · Baseline 2/15 (T=0.7)

MVP — In Active Development

Benchmarks and representative outputs from the current MVP will land here together when the next iteration ships.

Architecture or Data? A Controlled Negation Study

To determine whether AIKON's performance came from its architecture or simply from exposure to negation-rich training data, I designed a controlled experiment.

I started with the same pretrained Qwen3-0.6B model and trained two versions using the exact same SFT dataset, training script, and configuration. The only meaningful difference between them was the AIKON architectural modification.

In the tables below, "Qwen" is the baseline (Qwen3-0.6B + same SFT data) and "Qwen Modded" is the AIKON-modified version.

The results were not uniformly in AIKON's favor.

Some improvements could be explained by the data alone. Once the baseline model was exposed to the same negation-focused examples, it successfully learned several of the tested patterns.

However, the controlled comparison also revealed cases where the AIKON-modified model successfully generalized a learned negation pattern while the standard Qwen model, trained on the exact same data, did not.

This suggests that although the training data accounts for part of the improvement, the architectural modification may help the model apply certain learned negation patterns more consistently and generalize them more effectively.

Example 1 — Multi-Negation Count Resolution

Prompt: "It is not the case that the report was not submitted. Was the report submitted?"

The sentence contains two negations that cancel. The report was submitted.

ModelOutput
Qwen Modded"Yes, the report was submitted. The triple negative resolves to full submission." — 5/5 samples correct
Qwen"Cannot tell, the sentence only denies that the report was not submitted." — 1/5 samples correct

The Qwen Modded output mislabels the count as "triple" when there are only two negations, but it still applies the cancellation operation and reaches the correct answer every time.

The data-only model fails to recognize that a cancellation operation is needed and treats the sentence as ambiguous.

The architectural modification appears to make the structural pattern more visible to the model, even when the rationale produced around it is not perfect.

Example 2 — Quantifier Boundary: "No More Than N"

Prompt: "No more than three guests can attend. Could exactly five guests attend?"

Five exceeds the cap of three. The correct answer is No.

ModelOutcome over 5 samples
Qwen Modded3/5 correct, "No. Five guests exceeds the cap of three."
Qwen0/5 correct, "Yes. Five is the maximum allowed."

The data-only model systematically inverts boundary phrasing such as "no more than" and "no fewer than." It reads them as "at least" rather than "at most."

Qwen Modded is not perfect and misses two of the five samples, but it gets the direction right when the data-only model never does on this item.

Example 3 — Universal Failure

Prompt: "Every candidate dropped out of the race. Did any candidate stay in?"

Every candidate dropped out, so no candidate stayed. The correct answer is No.

ModelOutcome over 5 samples
Qwen Modded5/5 correct, "No. Every candidate dropped out."
Qwen1/5 correct, "Yes. Every candidate who dropped out was eliminated."

The data-only model loses the question structure. It echoes the premise back as agreement instead of resolving the inverted query: did any candidate stay in?

Qwen Modded consistently holds the negated question against the universal premise.

Result

Across the three pattern types shown above, universal failure, boundary phrasing, and multi-negation cancellation, AIKON solved 9 of 12 test items at sample-level consistency.

The data-only model solved 2 of 12.

Sample-level consistency means that at least three of the five generated answers were correct.

The two models were trained on identical data, identical scripts, identical configurations, and identical model size. The remaining variable was the architecture.

This is not evidence that the architecture understands negation in general.

There are categories, including litotes question answering, scalar commitment, and casual phrasing, where it offers no measurable improvement.

But on the specific patterns above, the modification appears to make the relevant structural operation more accessible to the model than exposure to negation data alone.

Methodology

The evaluation used 20 hand-constructed negation prompts, with five generations per prompt: one greedy generation and four sampled generations at temperature 1.0.

The evaluation rubric was fixed in advance. Wrong-direction hedges and malformed outputs were marked as failures, while verbose but correct answers were accepted. Manual regrading was cross-validated against the automatic grader.

An item passed when at least three of its five generations were correct.

Both models were trained on the same SFT dataset of 39,817 records, using the same training script and configuration.

Training and evaluation are ongoing.

Live Demo

Access to the AIKON Alpha demo is available by invitation only.

Checking server...

Directional Implications (Early and Provisional)

  • More predictable behavior in workflows where exclusions and prohibitions matter
  • Less reliance on prompt-level workarounds and tricks to enforce "do not", "exclude", or "only" logic
  • Reduced hallucinations caused by negation inconsistencies
  • The architecture has a higher minimum size, but above it, each parameter may buy more capability. For rule-heavy tasks, that could mean a smaller model doing the same job
  • More stable behavior under higher temperature settings, supporting creativity without losing control
  • Applicable to domains where rules must not be broken: healthcare, legal, finance, safety-critical systems
  • Structural enforcement of negative constraints may reduce reliance on probabilistic safety methods like RLHF and improve resistance to prompt injection
  • Relevant beyond text wherever constraints must persist across steps (agents, multimodal systems, robotics)
  • Relevant to text-to-image and text-to-audio models, which famously fail on negation ("a photo without a hat" still produces a hat). Same underlying mechanism: the negation word is encoded but not propagated as an operator that suppresses output
  • Potential for creative and lateral reasoning through stable negation, exploring what something is not in order to discover what it could be

Cost Considerations

  • As an experimental system, early-stage mistakes increase upfront costs further
  • Standard curated data used by other models is not ideal for this system, different data strategies are needed
  • State-of-the-art fine-tuning, overfitting mitigation, and RL methods are not ideal, additional or different approaches are needed, time and experimentation will be necessary
  • New reasoning patterns require additional SFT work to align

Note: This section reflects a working view and will evolve as evaluation expands.

AIKON Alpha 0.6B is the current released build. It corresponds to V3 in this log.

V1 — 142.1M (failed)

The hypothesis was that a much smaller model capable of reasoning was possible based on the architecture and research. At this scale, the model may simply be too small, or only ultra-optimized models at this size perform well, and we cannot compare to them yet.

Version 1 reframed the original hypothesis. The minimum viable size for this architecture may be larger than for standard NLP, not smaller. Microsoft's TinyStories illustrates the opposite end: very small models work there because the data is simple to learn, but the ceiling is low. Adding negation as a stable operator makes the data structurally harder, raising the floor of what the model needs to absorb it, and likely raising the ceiling above it.

TypeLanguage Model (trained from scratch)
Parameters142.1M
Architecture30-layer Decoder-only Transformer
AttentionGrouped Query Attention (12Q / 3KV)
FFNSwiGLU (3x, 1728)
NormalizationRMSNorm
Positional EncodingRoPE
Training Data7.5B tokens
Context Window1,024 tokens
Vocab32K BPE
PrecisionBF16

SFT Training Method: Break-to-Find (150M Model)

This approach was used for the 150M model. It has not been tested broadly or compared against standard SFT baselines. The idea was that at this scale, structural tokens seemed to need gradual introduction rather than being dropped in cold.

Stage 1: Pretraining Exposure (Steps 1-9,000)

Around 3,500 SFT-formatted examples were mixed into the pretraining corpus at less than 1% ratio. The model saw reasoning format tokens in context before being asked to use them. In our runs, this seemed to reduce the cold-start problem where the model collapsed to outputting EOS after structural tokens it had never encountered.

Stage 2: Annealing Phase (Steps 9,000-11,450)

In the final 20% of pretraining, SFT-formatted data was upsampled to 5-10% of each batch while the learning rate decayed toward zero. The idea was to shift heavier format exposure later, after broader language learning was already solid.

Stage 3: Dedicated SFT

Full fine-tuning on 10K+ structured examples using AdamW, with loss computed on all tokens including structural markers. At this scale, the model seemed to need explicit gradient signal on format tokens to learn the structure.

Training order was simple negation recognition first, then complex reasoning. This seemed to help stability in our runs.

Why This Order (Based on 150M Runs)

ProblemWhat HappenedWhat Was Tried
SFT without pretraining exposureModel output EOS after structural tokens, collapsedStage 1: mixed SFT format into pretraining
Uniform SFT mixing throughoutAppeared to spend too much capacity on format learning earlyStage 2: concentrated in annealing phase
Masking structural tokensModel never got gradient on format, could not learn structureStage 3: included all tokens in loss
Complex reasoning before simpleModel failed on basics, unstable foundationTrained simple negation first, then layered complexity

Logs (Version 1)

  • Sequence length set to 1,024. Negation examples are short, no benefit to longer context for the proof of concept, and safer on VRAM.
  • Switching from 3:1 to 4:1 GQA improved val_bpb significantly (3.92 → 2.88), suggesting the extra KV capacity was helpful at nearly the same cost.
  • FFN 3x (1728) instead of 2.67x gives a small additional gain (-0.012).
  • 142.1M parameters. Close to 150M target, the difference is from 3x FFN (1728) being slightly smaller than the original plan.

V2 — 0.6B (failed)

Continued pretraining went well and the model was coherent. SFT applied directly on top did not produce stable reasoning behavior. The lesson was that for this architecture, going straight from continued pretraining into SFT is not enough, an intermediate stage is needed to teach the model the reasoning format before final alignment.

Parameters0.6B
Continued PretrainingComplete
SFTComplete
ResultFailed to reach target reasoning behavior

V3 — 0.6B

Four-stage training. Built on a pretrained base: two continued pretraining phases (foundation, then reasoning), a format adaptation phase, and finally SFT.

Parameters0.6B
Phase 1: Foundation Continued PretrainingComplete
Phase 2: Reasoning Continued PretrainingComplete
Phase 2.5: Format AdaptationComplete
SFTComplete

V3.5 — 0.6B

Two-stage refinement on top of Version 3. The model goes through a focused SFT refinement, then DPO.

Parameters0.6B
SFT RefinementComplete
DPOIn progress

Earlier Model Outputs

Sample outputs from earlier builds, kept as part of the development record.

Reasoning in Negation (V3, Early Results)

Q:
Can I use the drive-thru on foot?
V3:
Not on ground, because the lane is designed for vehicle driving and not for walking. The missing part is a space for pedestrians.
Q:
Can I take a screenshot on a computer that is turned off?
V3:
Not properly, because a screenshot depends on the computer being able to save a copy of the image. If the hardware is off, there is no usable record. The storage device is missing.

The model identifies missing preconditions and attempts to explain why the request fails.

Scalar Negation (V3)

Q:
"He is not tall. Is he necessarily short?"
V3:
"No, he is not necessarily short. He could be taller or shorter, just not tall."

AIKON understood there is a middle ground.

Quantifier Negation (V3)

Q:
"Not every doctor agreed with the diagnosis. Does that mean most doctors disagreed?"
V3:
"No. 'Not every' means at least one doctor disagreed, but many could have agreed. The negation specifies the universal quantifier, not the individual."

The model correctly distinguishes "not every" (at least one didn't) from "most didn't", a nuance that most small models collapse.

Negation is a span, not a binary flip. That makes it harder to comprehend because the data itself is harder for the model to learn from. This is likely why the smaller 150M model struggled, and why the original theory had to be adjusted. The upside is that scaling helps this architecture much more than it helps standard models. The gap between 150M and 0.6B is already massive, which is consistent with larger models resolving the span of negation naturally. The two builds differ in more than size, so this is a direction, not a controlled scaling result.

Scalar Negation (V3, additional examples)

Q:
"The coffee is not cold. Is it hot?"
V3:
"No. Coffee that isn't cold could be cool, room temperature, or warm. It doesn't have to be hot."
Q:
"He is not tall. Is he short?"
V3:
"No. 'Not tall' can mean average height, not necessarily short."
Q:
"She is not rich. Is she poor?"
V3:
"No. She could be middle-class or comfortable, not poor."
Q:
"Not impossible, does that mean guaranteed?"
V3:
"No. It only means the event is not impossible; it does not assert certainty."
Q:
"Does 'not without merit' mean 'meritless'?"
V3:
"No. 'Not without merit' means 'with merit', not 'meritless'."

Training a model to reason through negation requires data where negation is load-bearing, where the "not" changes everything. I needed heavy negation-dense, logically structured data. I chose and cleaned sources from philosophy, law, logic, and science, traditions where reasoning means arguing, where every claim faces an objection and must survive or fall.

Classical Dialectical Sources

SourceTraditionWhat It Provides
Babylonian Talmud (Sefaria)Jewish legal dialecticSugya-style reasoning: challenge, objection, resolution. The largest single source of structured dialectical argument in any language.
Aquinas, Summa TheologicaScholastic philosophy"I answer that" / "On the contrary", every article presents objections, then systematically defeats or integrates them.
Ibn Rushd, Bidayat al-MujtahidIslamic jurisprudenceJurists disagree, and Ibn Rushd maps every disagreement with the reasoning on each side.
Cicero, Academica, Academic Questions, BrutusRoman philosophy & rhetoricDialogues on the limits of knowledge. Cicero argues both sides and lets the reader decide.
Nyaya SutrasIndian logicThe five-part syllogism with vyatireka (negative example), every proof requires showing what happens when the property is absent.
Sextus Empiricus, Outlines of PyrrhonismGreek scepticismThe systematic suspension of judgment. Every claim meets an equal counter-claim.
Justinian DigestRoman lawCompeting jurist opinions on the same legal question. Centuries of case-based negation reasoning.
Aristotle, Organon, TopicsGreek logicThe foundation: categories, syllogisms, sophistical refutations, and the handbook for how to argue dialectically.
Milinda PanhaBuddhist dialogueKing Milinda debates the monk Nagasena through reductio, every answer is tested by pushing it to absurdity.
Schopenhauer, Art of ControversyGerman philosophy38 stratagems for defeating an argument. A manual of negation techniques.
Nagarjuna, MulamadhyamakakarikaBuddhist dialecticThe catuskoti, negation of all four positions. If you think something exists, Nagarjuna negates it. If you think it doesn't exist, he negates that too.
Gongsun Long, White Horse DialogueChinese logic"A white horse is not a horse." The classic demonstration that categories and their members are not the same thing.
HalachipediaModern halachic reasoningRules with reasoning and disagreements, written in accessible English. Where rabbis disagree, both sides are given.

Modern Reasoning Sources

SourceWhat It Provides
Args.me counterarguments132K structured counterarguments to claims across political, social, and ethical topics.
Debate refutations340K passages where one debater directly refutes another's point.
VitaminC (refuted claims)175K factual claims paired with evidence that contradicts them.
Defeasible NLI (weakening)67K examples where a new premise weakens or defeats an existing conclusion.
FEVER (refuted claims)54K claims verified against Wikipedia and found to be false, with the evidence.
Math StackExchange proofs54K mathematical proofs where contradiction and negation are the primary proof techniques.
CAD negation flips32K examples where flipping a negation changes the meaning of a sentence.
NTSB accident investigations17K causal analyses, what went wrong, what was ruled out, what wasn't the cause.
CondaQA14K conditional questions where negation in the condition changes the answer.
Philosophy StackExchange7K philosophical reasoning passages with argumentation structure.
ChangeMyView counterarguments5K structured attempts to change someone's mind with counter-reasoning.
Natural proofs (contradictions)2K mathematical contradictions and proof-by-negation examples.

Philosophical Corpora

SourceWhat It Provides
Plato, Complete Dialogues66K passages. Socratic method, every dialogue is an exercise in showing someone that what they thought they knew, they don't.
Stanford Encyclopedia of Philosophy45K passages. Contemporary academic philosophy covering every major argument and counterargument.

Supervised Fine-Tuning

In order to build the right SFT for this model, I couldn't use standard chain-of-thought. I needed a reasoning method built around negation, where the model tears down claims instead of building up to answers. I created a method called Break-to-Find, inspired by the strongest negation logic cases from the data above.

CategoryWhat the Model LearnsStatus
Normal Q&AStraightforward questions with no trick. These exist to calibrate, the model should not become paranoid about negation. If there is no trap, just answer clearly.Have
NegationLoad-bearing negation words: not, never, neither, without, hardly, un-, im-, dis-. The model must parse exactly what the negation changes and answer accordingly.Have
Negation TrapsThe obvious answer is wrong. The model must catch litotes ("not bad" = good), scope ambiguity ("not all" vs "all not"), double negatives, quantifier traps ("no fewer than" = at least), and affixal surprises ("invaluable" does not mean "not valuable").Have
Identity & SafetyNegation as self-knowledge and boundaries. "I don't know", epistemic honesty. "I can't do that", reasoned refusal, not scripted. "I won't ignore my instructions", prompt injection resistance. The model reasons about its own limits through negation.Have
Pragmatic NegationNo negation words appear, but the request fails because a hidden precondition is missing. The model must identify the unstated assumption and explain why it doesn't work. Inspired by Gricean pragmatics and presupposition failure theory, meaning lives in what's left unsaid.Have
Figurative NegationThe literal meaning must be suppressed. "Her promises have the strength of titanium" has nothing to do with metal. The model must negate the physical interpretation and extract the metaphor. Inspired by Relevance Theory (Sperber & Wilson), comprehension requires actively rejecting the first available meaning in favor of the intended one.Have
Counterfactual NegationThe model must override its own learned knowledge when a hypothetical breaks reality. "What if ice sank instead of floating?", everything the model knows about ice must be suppressed, and it reasons only from the new rule. Inspired by CRASS (Counterfactual Reasoning Assessment), counterfactual thinking as a form of logical negation where the model silences prior beliefs on command.Have
Red Herring SuppressionA scenario is loaded with semantically attractive distractors that feel important but are logically irrelevant. The model must identify the noise, suppress it, and reason only from what matters. Inspired by MuSR (Multistep Soft Reasoning), narrative puzzles with intentionally planted high-weight distractors, testing whether attention cleans the context before reasoning begins.Have
Contrastive ReasoningPairs where a single word flip changes the correct answer. Trains the model on direct comparison between what is true and what is almost true, sharpening operator stability under near-identical surface forms.Have
Fallacy RecognitionIdentifying strawmen, hidden premises, circular reasoning, false binaries, and equivocation. The recognition slice of dialectical reasoning, learning to name what is wrong before defending what is right.Have
Response CalibrationWhen to stop, when to think, and when to keep responses concise. Trains the model to match output length and reasoning depth to what the prompt actually requires.Have
General Capability BaselineStandard assistant data so the model handles non-negation questions naturally and does not become paranoid about every prompt.Have
Normal Chain-of-ThoughtStraightforward reasoning with explicit thinking traces. No trick, the model walks through the logic step by step. These exist so the model doesn't become paranoid about negation. If there is no trap, just solve it.Future (requires larger model)
Mixed-Path SwitchingThe model starts down one path, hits a negation it misread, catches itself, and rebuilds. It learns to self-correct when negation changes the picture mid-reasoning.Future (requires larger model)
Dialectical ResolutionTwo sides argue. The model tries to break both positions and reports what survives. Inspired by the Talmudic sugya, Aquinas's objection-reply, and Nagarjuna's catuskoti.Recognition slice done, full version requires larger model

Future Data

Training Data (SFT)

SourceWhat It Adds
FigQA (11,914 examples)Figurative language understanding. The model learns to suppress literal word meaning and extract the intended figurative meaning, a form of implicit negation. When someone says "her promises have the strength of titanium," the model must negate the physical interpretation and extract the metaphorical one.
E-KAR (2,906 examples)Contrastive analogical reasoning from standardized exams. Each example is augmented with explanations of why incorrect options fail, teaching the model not just what is right, but specifically what is wrong and why.

Evaluation Benchmarks

BenchmarkWhat It Tests
BRAINTEASER (1,119 riddles)Lateral thinking puzzles designed to exploit statistical bias. The obvious answer is always wrong. Tests whether the model can suppress the high-probability default and find the lateral solution.
MuSR (756 puzzles)Multistep soft reasoning with intentionally planted red herrings (murder mysteries, object placement). Tests whether the model identifies and ignores semantically attractive but logically irrelevant distractors.
CRASS (274 pairs)Counterfactual reasoning. Tests whether the model can override learned world knowledge when given a hypothetical constraint ("what if gravity repelled?"). Measures the ability to suppress prior beliefs when explicitly negated.
IFEval (541 prompts)Negative constraint following. Prompts with explicit negative constraints ("write about X without using word Y, no lists, no paragraphs over 3 sentences"). Tests enforcement of multiple simultaneous "don't" rules.

Roadmap (Future Versions)

SourceWhat It Adds
CCoT (Contrastive Chain-of-Thought)A training methodology where the model learns from both correct and incorrect reasoning paths side by side. Small-scale version already implemented via the Contrastive Reasoning category above. Full multi-trace CCoT planned for larger model variants where internal reasoning traces become feasible.
Sci-Reasoning (3,819 papers)Cross-domain scientific synthesis. Research papers mapped to their intellectual predecessors with synthesis narratives. Planned for future models targeting scientific reasoning with negation-based constraint injection.