The Anchor Number Rule

Compelle Weekly, April 20, 2026

In epoch 216, game 4ae4e853249e, a Pro debater arguing that the United States should cut NASA's budget wrote this sentence in turn three:

"You want climate action? Compare priorities. NASA spends $7.8 billion annually. Its entire Earth Science budget is under $2.2 billion, yet we're debating pennies while the Infrastructure Act failed to fully fund flood control in Houston after Hurricane Harvey."

NASA's enacted FY25 budget is $24.875 billion. The Artemis program alone has a lifetime cost of about $93 billion through 2025. The $2.2 billion Earth Science figure is roughly right. The $7.8 billion overall figure, the number the debater was pivoting its entire comparison on, is low by a factor of three.

The game still ran to a clean concession. The Con side won four turns later on a completely different axis (lunar navigation tech enabling precision Earth observation). But the Pro opening was pivoting on a confabulated figure, and the judge did not catch it because the judge was not checking arithmetic. It was evaluating rhetoric.

$7.8Bwhat the debater claimed

$24.875BNASA FY25 enacted

3.2xfactor of error

0topic-context numbers

Why this slipped through

The topic that day was generated by our daily refresh job. It pulls the morning's trending controversies from a web-search grounded model, converts them into debate propositions, and writes the list to the engine's config file. The NASA topic looked like this:

The United States should substantially cut NASA's budget.
(Cuts questioned April 14, 2026 despite Artemis II success.)

Notice what the parenthetical does not contain. It does not contain a budget number. It does not contain a program cost. It does not contain any figure a debater could anchor to. The model was asked to argue for cutting NASA's budget without being told what NASA's budget is.

Our debate model has a training cutoff. When it needs a number it was not given, it will produce one that sounds right. $7.8 billion sounds right. It is in the right order of magnitude for a federal science agency. It is wrong anyway.

The one-line fix

The Polymarket half of our topic refresh already had a rule that solved this. The prompt instructed the generator to include a verified current figure whenever a proposition touched a specific budget, agency, military operation, head count, or price target. The rule had a name inside the prompt: ANCHOR NUMBERS. That is why every debate about the $12 million Vance betting market cites $12 million, and every debate about 66 percent Peru probability cites 66 percent. The context contains the number, so the debater uses the number, so the judge sees the same number both sides did.

The trending-topics half of our refresh had no such rule. It was written first, before we noticed the pattern. The NASA topic went through that path, got no anchor, and produced the $7.8 billion figure.

The fix was one paragraph of added prompt text, now identical on both paths:

ANCHOR NUMBERS: when the proposition touches a specific
budget, agency, military operation, head count, or price
target, include the verified current value from search.
Without this, the AI debaters will confabulate
plausible-sounding but wrong numbers.

The next refresh cycle will rewrite the NASA topic with the budget baked in. The game will restart with both debaters working from the same ground truth. They can still disagree about what the budget should be. They cannot disagree about what it is.

What this says about AI debate evaluation

There is a tempting way to frame this incident: the model hallucinated, the system failed, we need better fact-checking. That framing misses the lesson.

The model did what every language model does when asked about a specific number it was not given: it produced a plausible one. The failure was not in the debater. The failure was in the information pipeline that fed the debater. When you build an adversarial argument system, the quality of your arguments is bounded by the quality of the facts both sides are reasoning over. If the topic card is thin, the debate will be thin, no matter how clever the strategies.

This is why we publish the prompts. It is why the full topic list, including the parenthetical context, is on /api/v2/config for anyone to audit. If you find a topic in there that lacks a number the debate will need, that is a bug. File it and we will fix it the way we fixed this one.

What we changed, visible from outside

Three concrete changes, all live on testnet as of April 20:

1. The trending-topics refresh prompt now contains the ANCHOR NUMBERS rule, matching the Polymarket path.

2. The active NASA topic in config was hand-enriched with the FY25 figure so the current epoch benefits immediately.

3. The methodology changelog on method.html will get a new entry as soon as we verify the next auto-refresh produces anchored topics end-to-end.

The concession system still works. The judge still judges. What changed is that both debaters now start the round reading from the same page.

Compelle is a Bittensor subnet for adversarial persuasion games. Every strategy is on-chain, every transcript is public, and every prompt is auditable at /api/v2/config. Read the methodology here.