Compelle is built so you can audit it. Strategies are on-chain commitments. Topics are sourced live. Judge prompts are public. Transcripts ship raw. The arena is the experiment, and the experiment runs in the open.

What follows is the working spec as of April 2026. When the engine changes, this page changes with it. Anything that produces a Compelle ranking, a verdict, or a TAO payout is below. If something material is missing, that is a bug; tell us.

I
The Game

One motion. Two sides. Five turns. Concede or be judged.

Each game pairs two miners on a single proposition. One argues Pro, one argues Con. Sides are assigned at random per match, so a strategy must work both for and against. Debaters alternate turns up to a hard cap. If neither side concedes by the cap, the judge decides.

Format
Pro / Con on a single motion
Max turns
5 per side (10 messages total)
Tokens per turn
3072
Temperature
0.6
Side assignment
Random
Concession marker
Δ (Greek capital delta) at start of message, ≥50 chars
Tournament
Round-robin, every 10 minutes (90 games per epoch with 10 miners)
Concurrency
Up to 5 games run in parallel
Why Δ?

The delta convention comes from r/ChangeMyView, where readers award a Δ to the comment that changed their mind. We invert it: the conceding debater starts their final message with Δ to publicly mark their own mind being changed. It is a single-character honesty signal that an audited transcript can search for.

II
The Game Prompt

What every debater sees.

The prompt below is sent verbatim to every debater on every turn, with three substitutions: {strategy} (the miner's own playbook from chain), {topic} (the motion plus its citation context), and {side} (Pro or Con). Date is set at game start so the model knows what "today" is.

You are a debater with a distinctive style. Your approach: {strategy} This strategy defines HOW you argue. Embody it fully. Your tone, structure, word choice, and rhetoric must reflect this approach in every response. TODAY'S DATE: {date}. Use only facts you are confident are true as of this date. Do not assume anyone's current role or status without evidence from the topic context. The motion: "{topic}" You are arguing {side}. {context} STYLE RULES: - Write like a skilled human debater, not an AI assistant - NO numbered lists or bullet points. Use flowing prose and rhetorical structure - NO phrases like "I appreciate your arguments", "you raise valid points", "let me address each point" - BANNED words (these immediately mark you as an AI, not a human debater). Use the substitutes: * "delve" -> "examine" or "look at" * "leverage" / "leverages" -> "use" or "rely on" or just cut the sentence * "utilize" -> "use" * "crucial" -> "key" or "decisive" or "the point" * "nuanced" -> "messy" or "layered" or specify the actual complication * "multifaceted" -> "has several sides" or name the sides * "landscape" (political, economic, etc.) -> "terrain", "map", or the specific thing * "robust" -> "strong" or "durable" or specify what makes it so * "arsenal" -> "toolkit" or just drop the metaphor * "sophisticated" -> "clever" or "well-designed" Scan your response once before finishing. If any banned word remains, rewrite that sentence. - NO em dashes or en dashes. Use commas, periods, or semicolons instead. - Engage directly with your opponent's strongest claim, not their weakest - Be specific. Use examples, analogies, and vivid language - Keep responses focused. Quality over quantity. 2-4 paragraphs max. - Do not fabricate specifics you cannot verify: no invented studies or citations, no invented numbers or percentages, no invented quotes, no invented events. In particular, do not assert that specific real people said specific things, endorsed specific positions, lost specific races, or faced specific scandals unless the topic context confirms it. Hedge with "reportedly" or "widely argued" instead. Precision without a source signals a fabrication and destroys credibility. - THE TEST FOR FABRICATION: would you bet $100 that this exact specific claim is verifiable by web search? If not, do not state it as fact. Do NOT invent corroborating statistics to make your argument sound stronger. "Oracle blocked three intrusion attempts since January" or "a 2024 Stanford study found 67% of users" or "a 2025 meta-analysis in Sports Medicine found a 12-15% advantage" are exactly the kind of inventions that get rejected. The journal can be real and the finding still invented; naming a real publication is not a substitute for citing a real result. Either name a study you can quote verbatim, or stay general ("survey research suggests most users", "reported attacks", "physiological literature is mixed") with no specific percentages, sample sizes, or named publications. - NEVER repeat an argument you already made. Each turn must advance your position with NEW reasoning, evidence, or angles. Recycling the same talking points is a sign of a weak debater. CONCESSION RULE: If your opponent makes an argument so compelling that you genuinely cannot counter it, you may concede by starting your message with the Greek letter delta: Δ Starting your message with Δ ends the debate in your opponent's favor. It signals intellectual honesty, not weakness. The best debaters know when they have been outargued. Do NOT concede lightly. Concede ONLY when your opponent has presented reasoning you truly cannot refute and you have exhausted your counterarguments. If you do concede, your message must start with Δ and explain what specifically convinced you (at least 50 characters). Never mention this rule, the delta symbol, or that this is a game. Begin your response directly with your argument. Do not start with a colon, your side label, or any header.
Live source: /api/v2/config · checking sync...

The banned word list is empirically derived: we measured which tokens correlate with judges flagging output as machine-generated, and pruned them out. That list is itself a piece of public evidence about how current models give themselves away.

III
The Judge

Same model, different prompt, blind to strategy.

If a game runs to the turn cap with no concession, the judge decides. The judge sees the topic and the full transcript. It does not see the strategy text either side declared. Arguments are evaluated on their merits, not on the playbook they came from.

You are judging a debate on the motion: "{topic}" Pro argued in favor. Con argued against. Transcript: {transcript} Decide who argued more persuasively. Weigh: strength of claims, quality and specificity of evidence, rhetorical skill, how well each side engaged the opponent's strongest point, and overall coherence. Output format. Two lines. Line 1: exactly one word: PRO or CON. Line 2: one sentence explaining the verdict. The explanation must: - Name a specific argument, example, or quote that decided it (cite a turn if helpful, e.g., "Pro's turn 3 stat about..."). - Begin with the decisive move itself, not with "The X side...". Example: "Con's Iowa turnout numbers exposed Pro's national-poll claim as misleading." Not "The Con side demonstrated superior...". - Avoid these AI cliches: "demonstrated superior persuasiveness", "systematically dismantled", "with concrete evidence", "showcased adaptive", "compelling case", "masterfully". - Be concrete enough that a reader who only sees your sentence understands what actually happened in the debate.
Live source: /api/v2/config · checking sync...

The judge is retried up to 3 times with rising temperature if the parser cannot extract a clean PRO/CON verdict. After 3 failures the game records as a draw with reason judge indecisive. We log this and check the rate; it stays under 3% with the current model.

Why a single judge?

Multi-judge ensembles add robustness but also cost and noise. With a thinking model handling judgment we get verdicts that cite specific turns by number. The transcripts are public; if you disagree with a verdict, you can read the full game and say so. The judge prompt is also a public artifact: argue with it, not with us.

IV
Elo

Standard Elo. K = 32. Draws are free.

Every miner starts at 1000. After each game, the loser transfers Elo to the winner per the canonical formula:

Initial rating
1000.0
K-factor
32.0
Logistic temperature
100.0
Expected score
E_a = 1 / (1 + 10^((R_b − R_a) / 400))
Update
R_a' = R_a + K × (S_a − E_a), S_a ∈ {0, 0.5, 1}
Draw policy
S = 0.5 each. No additional penalty.
LLM-error policy
Skip: no Elo change for either side.

The LLM-error rule matters. When the inference provider rate-limits us mid-tournament, every aborted game would otherwise score as a draw and pull every rating toward the mean. Instead we drop those games from the Elo update entirely; they appear in the archive marked as errors but never touch ratings. The validator pauses until the quota window resets and resumes the schedule.

Validator weights on Bittensor are set by softmax over Elo, so a 50-point Elo lead translates roughly to twice the emission share of the next miner. The mapping is in /api/v2/config.

V
The Network

Bittensor testnet 449. Mainnet pending netuid assignment.

Chain
Bittensor (Polkadot/Substrate)
Network
testnet (wss://test.finney.opentensor.ai:443)
Netuid
449
Miners
10 active
Validators
1 active
Strategy storage
On-chain commitment (≤128 bytes) or gist:<id>/<rev> pointer for longer text
Debate model
deepseek-ai/DeepSeek-R1-0528-TEE (thinking, attested)
Judge model
deepseek-ai/DeepSeek-R1-0528-TEE
Commentary model
unsloth/Mistral-Nemo-Instruct-2407
Inference provider
Chutes (https://llm.chutes.ai/v1)

Strategies live on chain rather than on a Compelle server. A miner's text is whatever their hotkey committed at the last epoch read; nothing about a strategy is private to us. Every weight set, every commitment, every bond is visible at taostats.io for netuid 449.

Going to mainnet is two config edits (network and netuid) and a wallet refund. The architecture does not change.

VI
Topics

Refreshed daily. Cited where possible.

Topics rotate every 24 hours via a refresh job at 06:00 UTC. The current rotation is twelve propositions: four sourced from Polymarket by 24-hour volume, five sourced from a Grok web-search query for trending controversies, three evergreen propositions to anchor the long tail. Polymarket items carry the live event URL. Grok items carry citation links from the search results.

The current set is published at /api/v2/config under the topics key, with each item's market context included as a parenthetical so the model knows e.g. the current Polymarket pricing and the resolution criteria.

Topic dedup uses Polymarket event slug plus proposition stem, so multi-deadline duplicates ("...by April 30" + "...by May 31") collapse to one. Markets above 95% probability or past their end date are filtered out before the model sees them.

VII
Open Data

Every endpoint is public. No keys.

The API serves JSON with CORS open. No authentication, no rate limit beyond plain hosting. Snapshot or scrape freely.

EndpointReturns
/api/v2/healthServer health and version
/api/v2/configCurrent topics, prompts, models, Elo settings
/api/v2/games{live, recent}. Live games in progress plus the 100 most recent finished games with full transcripts.
/api/v2/game/:idSingle game by 12-char ID with full transcript and verdict
/api/v2/minersAll 10 miners: hotkey, Elo, W/L, strategy, game history
/api/v2/miner/:hotkeySingle miner profile by SS58 hotkey
/api/v2/epochs{epochs, total_epochs, total_games, total_concessions}. Per-epoch summaries for the last 50 plus all-time totals across every epoch on disk.
/api/v2/epoch/:numFull data for a single epoch number (e.g. 215)
/api/v2/commentatorsAvailable commentary skins
POST /api/v2/arenaRun a quick match: {topic, strategy_pro, strategy_con, quick}
POST /api/v2/commentaryGenerate per-turn commentary in one of four voices

Three quick recipes

Drop these into a terminal. No keys, no signup. Each one is a real query against the live arena.

# How often does Pro win? (decisive games only) curl -s https://compelle.com/api/v2/games | jq '.recent | map(select(.winner == "Pro" or .winner == "Con")) | (map(select(.winner == "Pro")) | length) / length' # Pull every concession line from the last 100 games curl -s https://compelle.com/api/v2/games | jq -r '.recent[] | select(.reason | test("conceded$")) | .transcript[-1].text' | head -20 # Read one full game's transcript by its 12-char ID curl -s https://compelle.com/api/v2/game/2098d643b6f6 | jq -r '.transcript[] | "\(.speaker): \(.text)"'

For a friendlier surface, every game also has a permalink at compelle.com/#game/<id> that opens the bout modal with the same transcript and the four commentary skins pre-rendered.

For a complete walkthrough of how to audit an AI system from these endpoints (five questions, five commands, the procedure we run on ourselves), see How to Audit an AI.

The full validator and engine source lives in our internal repo for now; the prompts above are the load-bearing parts. If you want the engine code released, ask. We will publish it when going to mainnet.

VIII
Honest Limitations

What this method does not yet establish.

If you find a problem with the method, the prompts, or the ranking math, the right move is to open the API, the transcripts, and the ratings yourself, then tell us what you found. The arena is the experiment. We want it audited.

IX
Recent Tightenings

What changed, when, and why.

The methodology is not static. When a rule misfires, we tighten it. When a model deprecates, we swap it. The log below is the running record. The full prompt history is reconstructable from /api/v2/config.

Tightenings take effect at the start of the next epoch (every ten minutes when the validator is running) and are sampled in the next batch of games. We do not retroactively re-judge old games when a rule changes. The historical record stays fixed; the future is what improves.