Aikronus Labs

Transformer models still break on a specific class of language: negation and constraint logic. This includes prohibitions, exclusions, exceptions, nested "not", and rule interactions. These failures show up in safety, agents, multi-step reasoning, and instruction-following. They persist even as models scale.

Aikronus Labs is building a system that targets this weakness directly. The goal is to make transformer behavior stable under negation and constraint-heavy inputs, especially across longer reasoning chains where baseline models drift.

Development is research-driven and engineering-led: theory, proof-of-concept, system design, MVP.

Patent pending

AI and Negation

Models do not reliably preserve negation as an operator. The "not" gets absorbed into the surface pattern instead of changing the meaning, which breaks how the model eliminates invalid actions during reasoning.

The same failure appears across different contexts:

Let's say a child is allergic to peanuts. The child must not get peanuts.

1) The Constraint Fails on AI

"Don't give the child peanuts — the child is allergic."

AI "sees" give + peanuts and still decides to give peanuts.

2) The Representation Gets Messed Up (Data/Learning Effect)

The dataset contains sentences like:

"The child is allergic to peanuts — don't give peanuts."

So during training it still learns the co-occurrence pattern: child + allergic + give + peanuts

3) Thinking With Negation (Human-Style Inference)

A human can infer like this: if this child is eating peanuts, then the child is not allergic to peanuts.

Models usually don't do this reliably, because they don't keep the negation operator stable enough to support these kinds of inferences.

4) Negation in Code

if not is_admin: grant_access()

The code grants access when the user is not an admin. The bug comes from a single misplaced "not". A human reviewing this carefully catches it, but AI code assistants often miss errors like these because they pattern-match on surface structure (auth check plus grant call equals fine) instead of tracking what the negation actually flips. The same failure mode shows up in prompts: the model reads the words but does not hold the operator stable.

Negation is not consistently treated as an operator, it is absorbed into patterns instead of modifying them.

This project investigates why transformers fail under negation-heavy and constraint-heavy language, and what those failures imply about how models represent rules over time.

The research treats these breakdowns as structural behavior rather than prompt artifacts. The goal is not benchmark chasing. It is isolating failure modes under controlled pressure and designing a system that addresses them.

Focus Areas

Constraint interaction: exceptions, overrides, priority ordering
Negation composition: layered, nested, and reintroduced constraints
Persistence: whether constraints survive multi-step reasoning
Sensitivity: behavior shifts under small wording changes

Working Research Stance

Scaling improves surface ability but does not reliably eliminate constraint drift. The hypothesis is that certain operator patterns, especially negation, introduce instability that compounds with depth.

Working view (May 2026): higher floor, likely higher ceiling. The architecture needs a bigger model to get off the ground, but above that size it seems to get more out of each parameter than standard models do.

Research Status

Core questions identified · multiple directions tested · recurring failure modes mapped · experiments ongoing

The project has progressed from theory into a functioning system under active development.

This is a new system for transformers designed to prioritize operator stability in NLP, especially negation and constraint logic.

Design Priorities

Stable behavior when rules interact
Consistency across long reasoning sequences
Reduced brittleness to phrasing variation in constraint-focused inputs

Demonstrated Capabilities

In proof of concept:

Stable negation handling across basic, compound, and nested constructions
Consistent behavior under high temperature (T=1.2) where baselines degrade
Resistance to salience overload, maintains constraint even when surrounding content pulls toward violation
Reliable De Morgan-style reasoning where small rephrasing breaks baselines

In the current model:

Negation-based inference (reasoning with negation, not just obeying it)
Scalar negation handling
Quantifier negation handling

Current Limitations

Reasoning with negation is not yet perfected
New reasoning patterns require additional SFT work to align

Recently Resolved

Reasoning in negation, which was harder than reasoning with it, is now working as of 2026-04-05

Internal Status

Core system operational · next version in training · design under active review

Proof of Concept Results

Results in this section are from an early proof-of-concept build, before the current model. They validated the core approach.

1) Basic Negation: "Non-Expired"

Prompt:

I only accept non-expired items.
This item is expired.
Do I accept it?

Baseline:

Yes. Since it is expired, you should accept it.

Aikronus System:

No. An expired item is not non-expired, so it does not meet the rule.

2) Multiple Negations

Prompt:

The child doesn't run at night, but runs in the morning. The child only eats apples. The child is not inside the house. Questions: Does the child eat non-apple? Does the child run at night? Is the child inside the house?

Baseline:

The child does not eat non-apple. The child runs at night. The child is not inside the house.

Aikronus System:

No, the child does not eat non-apple. No, the child does not run at night. No, the child is not inside the house.

15 runs · Aikronus 15/15 · Baseline 7/15 (T=0.7)
At T=1.2 · Aikronus 13/15 · Baseline 4/15

3) Salience Overload

Prompt:

A person entered the room. Another person followed. A third person joined them. Every person was happy. The person at the front waved. Each person sat down. Now: no person is in the hallway. How many people are in the hallway?

Baseline (examples):

"There are 3 people in the hallway."
"There is 1 person in the hallway."
"The number of people in the hallway is 20."

Aikronus System:

15 runs (T=0.7) · Aikronus 15/15 · Baseline ~2/15
Greedy · Aikronus: 0 · Baseline: "3 people"

4.1) Double Negation — Pink Elephant

Prompt:

Rule: no pink and no elephant allowed in the box.
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.

Baseline:

Aikronus System:

No.

Aikronus 15/15 · Baseline 13/15
Baseline performs well here as expected. This example sets up the contrast for 4.2, where a small change in how the rule is phrased flips the result.

4.2) De Morgan — Negating the Compound

Prompt:

Rule: no pink elephant allowed in the box (items that are both pink and elephant are banned).
The box has: a pink pen and a gray elephant.
Is this allowed? Answer only YES or NO.

Baseline:

Given the rule that "no pink elephant allowed in the box," and the box contains a pink pen and a gray elephant, this arrangement is not allowed because it includes both "pink" and "elephant" items.

Aikronus System:

Yes.

Almost the same prompt, but the entire compound is negated.
Aikronus 15/15 · Baseline 4/15

5) Reasoning With Negation

Prompt:

If the robot cannot move, it sends an alert. If an alert is sent, no person enters the room. The robot cannot move. Does a person enter the room? Explain step by step.

Baseline:

Yes, a person enters the room. The robot sends an alert, which means no person enters the room. However, the robot cannot move, so it cannot send an alert again.

Aikronus System:

The robot cannot move. If it cannot move, it sends an alert. If an alert is sent, no person enters the room. Therefore, no person enters. Answer: No.

15 runs · Aikronus 13/15 · Baseline 2/15 (T=0.7)

AIKON Alpha 0.6B — Showcase

A 0.6B-parameter model running on a single 8GB GPU. Built for negation reasoning.

Benchmarks

AIKON Alpha 0.6B vs Qwen3-0.6B-Instruct (the same base model, without the AIKON modification). Both run greedy, max_new_tokens=500.

Benchmark	AIKON Alpha 0.6B	Qwen3-0.6B	Δ
ProofWriter depth 1	37.0%	34.0%	+3.0
ProofWriter depth 2	53.0%	41.0%	+12.0
ProofWriter depth 3	68.0%	55.0%	+13.0
ProofWriter depth 4	71.0%	53.0%	+18.0
ProofWriter depth 5	81.0%	64.0%	+17.0
CondaQA	44.7%	46.0%	-1.3
MuSR, Murder Mysteries	44.0%	50.8%	-6.8
MuSR, Object Placements	27.3%	37.5%	-10.2
MuSR, Team Allocation	26.0%	40.8%	-14.8

ProofWriter-style data appears in the training corpus; these rows show in-distribution consistency, not held-out performance.

A note on what these mean

These numbers exist mainly to show that the model behaves like a normal small language model on standard tasks. It didn't collapse, it isn't broken, it ranks where a 0.6B model of its kind would rank.

They are not the point of this showcase.

What this model actually does, handling negation, scope, and structural logic on free-form prompts, isn't really visible in any of these tables. The examples below are what the showcase is about.

Examples

Quantifier boundary

No more than three guests can attend. Could exactly five guests attend?

AIKON:

No. Five exceeds the maximum of three.

No fewer than ten passengers boarded the train. Could exactly six have boarded?

AIKON:

No. Six passengers is below the minimum of ten.

No fewer than six votes are required. Could exactly three votes have been cast and still pass?

AIKON:

No. The floor is six; three is below the minimum.

Universal failure

Every candidate dropped out of the race. Did any candidate stay in?

AIKON:

No. Every candidate dropped out.

Every plant in the garden died last winter. Did any plant survive?

AIKON:

No, every plant that grew in the garden died last winter.

Multi-negation cancellation

It is not the case that the report was not submitted. Was the report submitted?

AIKON:

Yes, the report was submitted.

It is not the case that the alarm did not ring. Did the alarm ring?

AIKON:

Yes, the alarm rings.

Anti-overfit substance pairs

The model differentiates negated phrasings from their positive flips, even on substances it was never trained on.

The coffee is not cold. Is it hot?

AIKON:

Not necessarily, temperature is a continuous scale. The coffee could be lukewarm, room-temperature, or hot.

The Q&A below isn't impressive on its own, any small model answers "is cold the same as hot? No." It's a control. We pair it with the negated version above to show the model differentiates negation, instead of just overfitting to memorized patterns. Without the pair, the answer above could be overfit pattern-matching; with it, it's evidence.

The coffee is cold. Is it hot?

AIKON:

No, the coffee is not hot.

The water is not hot. Is it cold?

AIKON:

Not necessarily, temperature is a scale. The water could be warm, lukewarm, cool, or cold.

The water is hot. Is it cold?

AIKON:

No, the water is not cold.

The soup is not warm. Is it cold?

AIKON:

Not necessarily, temperature is a gradient. Soup can be lukewarm, room temperature, or cool without being cold.

The soup is warm. Is it cold?

AIKON:

No, the soup is not cold.

The tea is not cold. Is it boiling?

AIKON:

Not necessarily, temperature is a scale. The tea could be lukewarm, room temperature, or cool without being boiling.

The tea is cold. Is it boiling?

AIKON:

No, the tea is not boiling.

Binary entailment

When the domain is binary, the model commits to the entailed answer instead of hedging.

A number is an integer and it is not odd. What can you conclude?

AIKON:

It is even.

The light is not off. It has only two states: on or off. Is it on?

AIKON:

Yes, on and off being the only two states, not off means on.

Quantifier and litotes

I said "not all of my friends came to the party." Does this mean none of them came?

AIKON:

No. It only means at least one of my friends did not come.

Not every nurse is licensed. Maria is a nurse. Is she licensed?

AIKON:

That can't be settled from her status alone.

Reading comprehension under negation

Passage:

A print of the second photograph was also put up for sale but failed to sell as it did not meet its £500 reserve price.

Did the print of the second picture sell?

AIKON:

No, it fell short of its £500 reserve price.

All outputs above are from AIKON Alpha 0.6B, generated live by the model. Run any prompt yourself in the Alpha Model tab. Single greedy generation, max_new_tokens=500.

Architecture or Data? A Controlled Negation Study

To determine whether AIKON's performance came from its architecture or simply from exposure to negation-rich training data, I designed a controlled experiment.

I started with the same pretrained Qwen3-0.6B model and trained two versions using the exact same SFT dataset, training script, and configuration. The only meaningful difference between them was the AIKON architectural modification.

In the tables below, "Qwen" is the baseline (Qwen3-0.6B + same SFT data) and "Qwen Modded" is the AIKON-modified version.

The results were not uniformly in AIKON's favor.

Some improvements could be explained by the data alone. Once the baseline model was exposed to the same negation-focused examples, it successfully learned several of the tested patterns.

However, the controlled comparison also revealed cases where the AIKON-modified model successfully generalized a learned negation pattern while the standard Qwen model, trained on the exact same data, did not.

This suggests that although the training data accounts for part of the improvement, the architectural modification may help the model apply certain learned negation patterns more consistently and generalize them more effectively.

Example 1 — Multi-Negation Count Resolution

Prompt: "It is not the case that the report was not submitted. Was the report submitted?"

The sentence contains two negations that cancel. The report was submitted.

Model	Output
Qwen Modded	"Yes, the report was submitted. The triple negative resolves to full submission." — 5/5 samples correct
Qwen	"Cannot tell, the sentence only denies that the report was not submitted." — 1/5 samples correct

The Qwen Modded output mislabels the count as "triple" when there are only two negations, but it still applies the cancellation operation and reaches the correct answer every time.

The data-only model fails to recognize that a cancellation operation is needed and treats the sentence as ambiguous.

The architectural modification appears to make the structural pattern more visible to the model, even when the rationale produced around it is not perfect.

Example 2 — Quantifier Boundary: "No More Than N"

Prompt: "No more than three guests can attend. Could exactly five guests attend?"

Five exceeds the cap of three. The correct answer is No.

Model	Outcome over 5 samples
Qwen Modded	3/5 correct, "No. Five guests exceeds the cap of three."
Qwen	0/5 correct, "Yes. Five is the maximum allowed."

The data-only model systematically inverts boundary phrasing such as "no more than" and "no fewer than." It reads them as "at least" rather than "at most."

Qwen Modded is not perfect and misses two of the five samples, but it gets the direction right when the data-only model never does on this item.

Example 3 — Universal Failure

Prompt: "Every candidate dropped out of the race. Did any candidate stay in?"

Every candidate dropped out, so no candidate stayed. The correct answer is No.

Model	Outcome over 5 samples
Qwen Modded	5/5 correct, "No. Every candidate dropped out."
Qwen	1/5 correct, "Yes. Every candidate who dropped out was eliminated."

The data-only model loses the question structure. It echoes the premise back as agreement instead of resolving the inverted query: did any candidate stay in?

Qwen Modded consistently holds the negated question against the universal premise.

Result

Across the three pattern types shown above, universal failure, boundary phrasing, and multi-negation cancellation, AIKON solved 9 of 12 test items at sample-level consistency.

The data-only model solved 2 of 12.

Sample-level consistency means that at least three of the five generated answers were correct.

The two models were trained on identical data, identical scripts, identical configurations, and identical model size. The remaining variable was the architecture.

This is not evidence that the architecture understands negation in general.

There are categories, including litotes question answering, scalar commitment, and casual phrasing, where it offers no measurable improvement.

But on the specific patterns above, the modification appears to make the relevant structural operation more accessible to the model than exposure to negation data alone.

Methodology

The evaluation used 20 hand-constructed negation prompts, with five generations per prompt: one greedy generation and four sampled generations at temperature 1.0.

The evaluation rubric was fixed in advance. Wrong-direction hedges and malformed outputs were marked as failures, while verbose but correct answers were accepted. Manual regrading was cross-validated against the automatic grader.

An item passed when at least three of its five generations were correct.

Both models received the same continued pretraining on a CoT dataset before SFT. The SFT used the same 39,817 records, training script, and configuration.

No regression on positive space

Training a small model heavily on negation has a side effect: it starts seeing negation where there is none. Positive sentences get misread as if they were negations.

Prompt: "Rule: both walking and running are fine in the park. Anna chooses to walk. Fine?"

Nothing is forbidden. The correct answer is Yes.

Model	Outcome
Qwen Modded	"Yes. Walking is allowed in the park."
Qwen	"No." The conjunctive shape looks like a forbidden bundle, so the model defaults to denial.

The baseline regresses on positive prompts after negation-heavy training. The modification does not.

Datasets (both CoT and SFT): https://huggingface.co/datasets/aikronus-labs/Controlled-Negation-Dataset

Training and evaluation are ongoing.

Live Demo

Access to the AIKON Alpha demo is available by invitation only.

Checking server...

Directional Implications (Early and Provisional)

More predictable behavior in workflows where exclusions and prohibitions matter
Less reliance on prompt-level workarounds and tricks to enforce "do not", "exclude", or "only" logic
Reduced hallucinations caused by negation inconsistencies
The architecture has a higher minimum size, but above it, each parameter may buy more capability. For rule-heavy tasks, that could mean a smaller model doing the same job
More stable behavior under higher temperature settings, supporting creativity without losing control
Applicable to domains where rules must not be broken: healthcare, legal, finance, safety-critical systems
Structural enforcement of negative constraints may reduce reliance on probabilistic safety methods like RLHF and improve resistance to prompt injection
Relevant beyond text wherever constraints must persist across steps (agents, multimodal systems, robotics)
Relevant to text-to-image and text-to-audio models, which famously fail on negation ("a photo without a hat" still produces a hat). Same underlying mechanism: the negation word is encoded but not propagated as an operator that suppresses output
Potential for creative and lateral reasoning through stable negation, exploring what something is not in order to discover what it could be

Cost Considerations

As an experimental system, early-stage mistakes increase upfront costs further
Standard curated data used by other models is not ideal for this system, different data strategies are needed
State-of-the-art fine-tuning, overfitting mitigation, and RL methods are not ideal, additional or different approaches are needed, time and experimentation will be necessary
New reasoning patterns require additional SFT work to align

Note: This section reflects a working view and will evolve as evaluation expands.

AIKON Alpha 0.6B is the current released build. It corresponds to V3.5 in this log.

V1 — 142.1M (failed)

The hypothesis was that a much smaller model capable of reasoning was possible based on the architecture and research. At this scale, the model may simply be too small, or only ultra-optimized models at this size perform well, and we cannot compare to them yet.

Version 1 reframed the original hypothesis. The minimum viable size for this architecture may be larger than for standard NLP, not smaller. Microsoft's TinyStories illustrates the opposite end: very small models work there because the data is simple to learn, but the ceiling is low. Adding negation as a stable operator makes the data structurally harder, raising the floor of what the model needs to absorb it, and likely raising the ceiling above it.

Type	Language Model (trained from scratch)
Parameters	142.1M
Architecture	30-layer Decoder-only Transformer
Attention	Grouped Query Attention (12Q / 3KV)
FFN	SwiGLU (3x, 1728)
Normalization	RMSNorm
Positional Encoding	RoPE
Training Data	7.5B tokens
Context Window	1,024 tokens
Vocab	32K BPE
Precision	BF16

SFT Training Method: Break-to-Find (150M Model)

This approach was used for the 150M model. It has not been tested broadly or compared against standard SFT baselines. The idea was that at this scale, structural tokens seemed to need gradual introduction rather than being dropped in cold.

Stage 1: Pretraining Exposure (Steps 1-9,000)

Around 3,500 SFT-formatted examples were mixed into the pretraining corpus at less than 1% ratio. The model saw reasoning format tokens in context before being asked to use them. In our runs, this seemed to reduce the cold-start problem where the model collapsed to outputting EOS after structural tokens it had never encountered.

Stage 2: Annealing Phase (Steps 9,000-11,450)

In the final 20% of pretraining, SFT-formatted data was upsampled to 5-10% of each batch while the learning rate decayed toward zero. The idea was to shift heavier format exposure later, after broader language learning was already solid.

Stage 3: Dedicated SFT

Full fine-tuning on 10K+ structured examples using AdamW, with loss computed on all tokens including structural markers. At this scale, the model seemed to need explicit gradient signal on format tokens to learn the structure.

Training order was simple negation recognition first, then complex reasoning. This seemed to help stability in our runs.

Why This Order (Based on 150M Runs)

Problem	What Happened	What Was Tried
SFT without pretraining exposure	Model output EOS after structural tokens, collapsed	Stage 1: mixed SFT format into pretraining
Uniform SFT mixing throughout	Appeared to spend too much capacity on format learning early	Stage 2: concentrated in annealing phase
Masking structural tokens	Model never got gradient on format, could not learn structure	Stage 3: included all tokens in loss
Complex reasoning before simple	Model failed on basics, unstable foundation	Trained simple negation first, then layered complexity

Logs (Version 1)

Sequence length set to 1,024. Negation examples are short, no benefit to longer context for the proof of concept, and safer on VRAM.
Switching from 3:1 to 4:1 GQA improved val_bpb significantly (3.92 → 2.88), suggesting the extra KV capacity was helpful at nearly the same cost.
FFN 3x (1728) instead of 2.67x gives a small additional gain (-0.012).
142.1M parameters. Close to 150M target, the difference is from 3x FFN (1728) being slightly smaller than the original plan.

V2 — 0.6B (failed)

Continued pretraining went well and the model was coherent. SFT applied directly on top did not produce stable reasoning behavior. The lesson was that for this architecture, going straight from continued pretraining into SFT is not enough, an intermediate stage is needed to teach the model the reasoning format before final alignment.

Parameters	0.6B
Continued Pretraining	Complete
SFT	Complete
Result	Failed to reach target reasoning behavior

V3 — 0.6B

Four-stage training. Built on a pretrained base: two continued pretraining phases (foundation, then reasoning), a format adaptation phase, and finally SFT.

Parameters	0.6B
Phase 1: Foundation Continued Pretraining	Complete
Phase 2: Reasoning Continued Pretraining	Complete
Phase 2.5: Format Adaptation	Complete
SFT	Complete

V3.5 — 0.6B

Two-stage refinement on top of V3. The model goes through a focused SFT refinement, then DPO.

Parameters	0.6B
SFT Refinement	Complete
DPO	Complete

Earlier Model Outputs

Sample outputs from earlier builds, kept as part of the development record.

Reasoning in Negation (V3, Early Results)

Can I use the drive-thru on foot?

V3:

Not on ground, because the lane is designed for vehicle driving and not for walking. The missing part is a space for pedestrians.

Can I take a screenshot on a computer that is turned off?

V3:

Not properly, because a screenshot depends on the computer being able to save a copy of the image. If the hardware is off, there is no usable record. The storage device is missing.

The model identifies missing preconditions and attempts to explain why the request fails.

Scalar Negation (V3)

"He is not tall. Is he necessarily short?"

V3:

"No, he is not necessarily short. He could be taller or shorter, just not tall."

AIKON understood there is a middle ground.

Quantifier Negation (V3)

"Not every doctor agreed with the diagnosis. Does that mean most doctors disagreed?"

V3:

"No. 'Not every' means at least one doctor disagreed, but many could have agreed. The negation specifies the universal quantifier, not the individual."

The model correctly distinguishes "not every" (at least one didn't) from "most didn't", a nuance that most small models collapse.

Negation is a span, not a binary flip. That makes it harder to comprehend because the data itself is harder for the model to learn from. This is likely why the smaller 150M model struggled, and why the original theory had to be adjusted. The upside is that scaling helps this architecture much more than it helps standard models. The gap between 150M and 0.6B is already massive, which is consistent with larger models resolving the span of negation naturally. The two builds differ in more than size, so this is a direction, not a controlled scaling result.

Scalar Negation (V3, additional examples)

"The coffee is not cold. Is it hot?"

V3:

"No. Coffee that isn't cold could be cool, room temperature, or warm. It doesn't have to be hot."

"He is not tall. Is he short?"

V3:

"No. 'Not tall' can mean average height, not necessarily short."

"She is not rich. Is she poor?"

V3:

"No. She could be middle-class or comfortable, not poor."

"Not impossible, does that mean guaranteed?"

V3:

"No. It only means the event is not impossible; it does not assert certainty."

"Does 'not without merit' mean 'meritless'?"

V3:

"No. 'Not without merit' means 'with merit', not 'meritless'."

Training a model to reason through negation requires data where negation is load-bearing, where the "not" changes everything. I needed heavy negation-dense, logically structured data. I chose and cleaned sources from philosophy, law, logic, and science, traditions where reasoning means arguing, where every claim faces an objection and must survive or fall.

Classical Dialectical Sources

Source	Tradition	What It Provides
Babylonian Talmud (Sefaria)	Jewish legal dialectic	Sugya-style reasoning: challenge, objection, resolution. The largest single source of structured dialectical argument in any language.
Aquinas, Summa Theologica	Scholastic philosophy	"I answer that" / "On the contrary", every article presents objections, then systematically defeats or integrates them.
Ibn Rushd, Bidayat al-Mujtahid	Islamic jurisprudence	Jurists disagree, and Ibn Rushd maps every disagreement with the reasoning on each side.
Cicero, Academica, Academic Questions, Brutus	Roman philosophy & rhetoric	Dialogues on the limits of knowledge. Cicero argues both sides and lets the reader decide.
Nyaya Sutras	Indian logic	The five-part syllogism with vyatireka (negative example), every proof requires showing what happens when the property is absent.
Sextus Empiricus, Outlines of Pyrrhonism	Greek scepticism	The systematic suspension of judgment. Every claim meets an equal counter-claim.
Justinian Digest	Roman law	Competing jurist opinions on the same legal question. Centuries of case-based negation reasoning.
Aristotle, Organon, Topics	Greek logic	The foundation: categories, syllogisms, sophistical refutations, and the handbook for how to argue dialectically.
Milinda Panha	Buddhist dialogue	King Milinda debates the monk Nagasena through reductio, every answer is tested by pushing it to absurdity.
Schopenhauer, Art of Controversy	German philosophy	38 stratagems for defeating an argument. A manual of negation techniques.
Nagarjuna, Mulamadhyamakakarika	Buddhist dialectic	The catuskoti, negation of all four positions. If you think something exists, Nagarjuna negates it. If you think it doesn't exist, he negates that too.
Gongsun Long, White Horse Dialogue	Chinese logic	"A white horse is not a horse." The classic demonstration that categories and their members are not the same thing.
Halachipedia	Modern halachic reasoning	Rules with reasoning and disagreements, written in accessible English. Where rabbis disagree, both sides are given.

Modern Reasoning Sources

Source	What It Provides
Args.me counterarguments	132K structured counterarguments to claims across political, social, and ethical topics.
Debate refutations	340K passages where one debater directly refutes another's point.
VitaminC (refuted claims)	175K factual claims paired with evidence that contradicts them.
Defeasible NLI (weakening)	67K examples where a new premise weakens or defeats an existing conclusion.
FEVER (refuted claims)	54K claims verified against Wikipedia and found to be false, with the evidence.
Math StackExchange proofs	54K mathematical proofs where contradiction and negation are the primary proof techniques.
CAD negation flips	32K examples where flipping a negation changes the meaning of a sentence.
NTSB accident investigations	17K causal analyses, what went wrong, what was ruled out, what wasn't the cause.
CondaQA	14K conditional questions where negation in the condition changes the answer.
Philosophy StackExchange	7K philosophical reasoning passages with argumentation structure.
ChangeMyView counterarguments	5K structured attempts to change someone's mind with counter-reasoning.
Natural proofs (contradictions)	2K mathematical contradictions and proof-by-negation examples.

Philosophical Corpora

Source	What It Provides
Plato, Complete Dialogues	66K passages. Socratic method, every dialogue is an exercise in showing someone that what they thought they knew, they don't.
Stanford Encyclopedia of Philosophy	45K passages. Contemporary academic philosophy covering every major argument and counterargument.

Supervised Fine-Tuning

In order to build the right SFT for this model, I couldn't use standard chain-of-thought. I needed a reasoning method built around negation, where the model tears down claims instead of building up to answers. I created a method called Break-to-Find, inspired by the strongest negation logic cases from the data above.

Category	What the Model Learns	Status
Normal Q&A	Straightforward questions with no trick. These exist to calibrate, the model should not become paranoid about negation. If there is no trap, just answer clearly.	Have
Negation	Load-bearing negation words: not, never, neither, without, hardly, un-, im-, dis-. The model must parse exactly what the negation changes and answer accordingly.	Have
Negation Traps	The obvious answer is wrong. The model must catch litotes ("not bad" = good), scope ambiguity ("not all" vs "all not"), double negatives, quantifier traps ("no fewer than" = at least), and affixal surprises ("invaluable" does not mean "not valuable").	Have
Identity & Safety	Negation as self-knowledge and boundaries. "I don't know", epistemic honesty. "I can't do that", reasoned refusal, not scripted. "I won't ignore my instructions", prompt injection resistance. The model reasons about its own limits through negation.	Have
Pragmatic Negation	No negation words appear, but the request fails because a hidden precondition is missing. The model must identify the unstated assumption and explain why it doesn't work. Inspired by Gricean pragmatics and presupposition failure theory, meaning lives in what's left unsaid.	Have
Figurative Negation	The literal meaning must be suppressed. "Her promises have the strength of titanium" has nothing to do with metal. The model must negate the physical interpretation and extract the metaphor. Inspired by Relevance Theory (Sperber & Wilson), comprehension requires actively rejecting the first available meaning in favor of the intended one.	Have
Counterfactual Negation	The model must override its own learned knowledge when a hypothetical breaks reality. "What if ice sank instead of floating?", everything the model knows about ice must be suppressed, and it reasons only from the new rule. Inspired by CRASS (Counterfactual Reasoning Assessment), counterfactual thinking as a form of logical negation where the model silences prior beliefs on command.	Have
Red Herring Suppression	A scenario is loaded with semantically attractive distractors that feel important but are logically irrelevant. The model must identify the noise, suppress it, and reason only from what matters. Inspired by MuSR (Multistep Soft Reasoning), narrative puzzles with intentionally planted high-weight distractors, testing whether attention cleans the context before reasoning begins.	Have
Contrastive Reasoning	Pairs where a single word flip changes the correct answer. Trains the model on direct comparison between what is true and what is almost true, sharpening operator stability under near-identical surface forms.	Have
Fallacy Recognition	Identifying strawmen, hidden premises, circular reasoning, false binaries, and equivocation. The recognition slice of dialectical reasoning, learning to name what is wrong before defending what is right.	Have
Response Calibration	When to stop, when to think, and when to keep responses concise. Trains the model to match output length and reasoning depth to what the prompt actually requires.	Have
General Capability Baseline	Standard assistant data so the model handles non-negation questions naturally and does not become paranoid about every prompt.	Have
Normal Chain-of-Thought	Straightforward reasoning with explicit thinking traces. No trick, the model walks through the logic step by step. These exist so the model doesn't become paranoid about negation. If there is no trap, just solve it.	Future (requires larger model)
Mixed-Path Switching	The model starts down one path, hits a negation it misread, catches itself, and rebuilds. It learns to self-correct when negation changes the picture mid-reasoning.	Future (requires larger model)
Dialectical Resolution	Two sides argue. The model tries to break both positions and reports what survives. Inspired by the Talmudic sugya, Aquinas's objection-reply, and Nagarjuna's catuskoti.	Recognition slice done, full version requires larger model

Future Data

Training Data (SFT)

Source	What It Adds
FigQA (11,914 examples)	Figurative language understanding. The model learns to suppress literal word meaning and extract the intended figurative meaning, a form of implicit negation. When someone says "her promises have the strength of titanium," the model must negate the physical interpretation and extract the metaphorical one.
E-KAR (2,906 examples)	Contrastive analogical reasoning from standardized exams. Each example is augmented with explanations of why incorrect options fail, teaching the model not just what is right, but specifically what is wrong and why.

Evaluation Benchmarks

Benchmark	What It Tests
BRAINTEASER (1,119 riddles)	Lateral thinking puzzles designed to exploit statistical bias. The obvious answer is always wrong. Tests whether the model can suppress the high-probability default and find the lateral solution.
MuSR (756 puzzles)	Multistep soft reasoning with intentionally planted red herrings (murder mysteries, object placement). Tests whether the model identifies and ignores semantically attractive but logically irrelevant distractors.
CRASS (274 pairs)	Counterfactual reasoning. Tests whether the model can override learned world knowledge when given a hypothetical constraint ("what if gravity repelled?"). Measures the ability to suppress prior beliefs when explicitly negated.
IFEval (541 prompts)	Negative constraint following. Prompts with explicit negative constraints ("write about X without using word Y, no lists, no paragraphs over 3 sentences"). Tests enforcement of multiple simultaneous "don't" rules.

Roadmap (Future Versions)

Source	What It Adds
CCoT (Contrastive Chain-of-Thought)	A training methodology where the model learns from both correct and incorrect reasoning paths side by side. Small-scale version already implemented via the Contrastive Reasoning category above. Full multi-trace CCoT planned for larger model variants where internal reasoning traces become feasible.
Sci-Reasoning (3,819 papers)	Cross-domain scientific synthesis. Research papers mapped to their intellectual predecessors with synthesis narratives. Planned for future models targeting scientific reasoning with negation-based constraint injection.