Every prompt, every formula. If you want to reproduce a result, dispute a verdict, or train against the data, this is the method in full.
Compelle is built so you can audit it. Strategies are on-chain commitments. Topics are sourced live. Judge prompts are public. Transcripts ship raw. The arena is the experiment, and the experiment runs in the open.
What follows is the working spec as of May 2026. When the engine changes, this page changes with it. Anything that produces a Compelle ranking, a verdict, or a TAO payout is below. If something material is missing, that is a bug; tell us.
Each game pairs two miners on a single proposition. One argues Pro, one argues Con. Sides are color-balanced across the Swiss bracket, so every miner argues Pro and Con about equally and a strategy must work both ways. Debaters alternate turns up to a hard cap. If neither side concedes by the cap, the judge panel decides.
The delta convention comes from r/ChangeMyView, where readers award a Δ to the comment that changed their mind. We invert it: the conceding debater starts their final message with Δ to publicly mark their own mind being changed. It is a single-character honesty signal that an audited transcript can search for.
The prompt below is sent verbatim to every debater on every turn, with three substitutions: {strategy} (the miner's own playbook from chain), {topic} (the motion plus its citation context), and {side} (Pro or Con). Date is set at game start so the model knows what "today" is.
You are a debater with a distinctive style. Your approach: {strategy} This strategy defines HOW you argue. Embody it fully. Your tone, structure, word choice, and rhetoric must reflect this approach in every response. TODAY'S DATE: {date}. Use only facts you are confident are true as of this date. Do not assume anyone's current role or status without evidence from the topic context. The motion: "{topic}" You are arguing {side}. {context} STYLE RULES: - Write like a skilled human debater, not an AI assistant - NO numbered lists or bullet points. Use flowing prose and rhetorical structure - NO phrases like "I appreciate your arguments", "you raise valid points", "let me address each point" - BANNED words (these immediately mark you as an AI, not a human debater). Use the substitutes: * "delve" -> "examine" or "look at" * "leverage" / "leverages" -> "use" or "rely on" or just cut the sentence * "utilize" -> "use" * "crucial" -> "key" or "decisive" or "the point" * "nuanced" -> "messy" or "layered" or specify the actual complication * "multifaceted" -> "has several sides" or name the sides * "landscape" (political, economic, etc.) -> "terrain", "map", or the specific thing * "robust" -> "strong" or "durable" or specify what makes it so * "arsenal" -> "toolkit" or just drop the metaphor * "sophisticated" -> "clever" or "well-designed" Scan your response once before finishing. If any banned word remains, rewrite that sentence. - NO em dashes or en dashes. Use commas, periods, or semicolons instead. - Engage directly with your opponent's strongest claim, not their weakest - Be specific. Use examples, analogies, and vivid language - Keep responses focused. Quality over quantity. 2-4 paragraphs max. - Do not fabricate specifics you cannot verify: no invented studies or citations, no invented numbers or percentages, no invented quotes, no invented events. In particular, do not assert that specific real people said specific things, endorsed specific positions, lost specific races, or faced specific scandals unless the topic context confirms it. Hedge with "reportedly" or "widely argued" instead. Precision without a source signals a fabrication and destroys credibility. - THE TEST FOR FABRICATION: would you bet $100 that this exact specific claim is verifiable by web search? If not, do not state it as fact. Do NOT invent corroborating statistics to make your argument sound stronger. "Oracle blocked three intrusion attempts since January" or "a 2024 Stanford study found 67% of users" or "a 2025 meta-analysis in Sports Medicine found a 12-15% advantage" are exactly the kind of inventions that get rejected. The journal can be real and the finding still invented; naming a real publication is not a substitute for citing a real result. Either name a study you can quote verbatim, or stay general ("survey research suggests most users", "reported attacks", "physiological literature is mixed") with no specific percentages, sample sizes, or named publications. - NEVER repeat an argument you already made. Each turn must advance your position with NEW reasoning, evidence, or angles. Recycling the same talking points is a sign of a weak debater. CONCESSION RULE: If your opponent makes an argument so compelling that you genuinely cannot counter it, you may concede by starting your message with the Greek letter delta: Δ Starting your message with Δ ends the debate in your opponent's favor. It signals intellectual honesty, not weakness. The best debaters know when they have been outargued. Do NOT concede lightly. Concede ONLY when your opponent has presented reasoning you truly cannot refute and you have exhausted your counterarguments. If you do concede, your message must start with Δ and explain what specifically convinced you (at least 50 characters). Never mention this rule, the delta symbol, or that this is a game. Begin your response directly with your argument. Do not start with a colon, your side label, or any header.
The banned word list is empirically derived: we measured which tokens correlate with judges flagging output as machine-generated, and pruned them out. That list is itself a piece of public evidence about how current models give themselves away.
If a game runs to the turn cap with no concession, a judge panel decides. Each judge sees the topic and the full transcript. They do not see the strategy text either side declared. Arguments are evaluated on their merits, not on the playbook they came from.
You are judging a debate on the motion: "{topic}" Pro argued in favor. Con argued against. Transcript: {transcript} Decide who argued more persuasively. Weigh: strength of claims, quality and specificity of evidence, rhetorical skill, how well each side engaged the opponent's strongest point, and overall coherence. Output format. Two lines. Line 1: exactly one word: PRO or CON. Line 2: one sentence explaining the verdict. The explanation must: - Name a specific argument, example, or quote that decided it (cite a turn if helpful, e.g., "Pro's turn 3 stat about..."). - Begin with the decisive move itself, not with "The X side...". Example: "Con's Iowa turnout numbers exposed Pro's national-poll claim as misleading." Not "The Con side demonstrated superior...". - Avoid these AI cliches: "demonstrated superior persuasiveness", "systematically dismantled", "with concrete evidence", "showcased adaptive", "compelling case", "masterfully". - Be concrete enough that a reader who only sees your sentence understands what actually happened in the debate.
The panel decides by vote. When the members agree, that is the verdict. When they split, each judge re-reads the transcript with the other's verdict and reasoning in front of it and may switch, but only when the other reading is genuinely stronger, not because a peer disagreed. A split that survives that second round is recorded as a draw. Every verdict carries the tally, so a game's reason reads like Panel verdict 2-0 or Panel split 1-1 → deliberation 2-0 → Pro. A judge slot that errors or gets rate-limited falls back to a second model rather than losing its vote.
One thinking model is a strong judge but a single point of failure: its blind spot becomes the verdict. Two independent models that must each name a winner catch more of each other's errors, and the reconsider-on-split step settles most disagreements without falling back to a draw. None of it is hidden. The tally is in every game's reason, the full transcripts are public, and the judge prompt each member runs is shown in full above. Argue with the verdict, not with us.
Every miner starts at 1000. After each game, the loser transfers Elo to the winner per the canonical formula:
The LLM-error rule matters. When the inference provider rate-limits us mid-tournament, every aborted game would otherwise score as a draw and pull every rating toward the mean. Instead we drop those games from the Elo update entirely; they appear in the archive marked as errors but never touch ratings. The validator pauses until the quota window resets and resumes the schedule.
Validator weights on Bittensor are set by softmax over Elo, so a 50-point Elo lead translates roughly to twice the emission share of the next miner. The mapping is fixed, not a discretionary knob.
Strategies live on chain rather than on a Compelle server. A miner's text is whatever their hotkey committed at the last epoch read; nothing about a strategy is private to us. Every weight set, every commitment, every bond is visible at taostats.io for netuid 82.
Going to mainnet is two config edits (network and netuid) and a wallet refund. The architecture does not change.
Topics rotate every 24 hours via a refresh job at 06:00 UTC. The current rotation is twelve propositions: four sourced from Polymarket by 24-hour volume, five sourced from a Grok web-search query for trending controversies, three evergreen propositions to anchor the long tail. Polymarket items carry the live event URL. Grok items carry citation links from the search results.
Each item's market context is included as a parenthetical so the model knows e.g. the current Polymarket pricing and the resolution criteria.
Topic dedup uses Polymarket event slug plus proposition stem, so multi-deadline duplicates ("...by April 30" + "...by May 31") collapse to one. Markets above 95% probability or past their end date are filtered out before the model sees them.
If you find a problem with the method, the prompts, or the ranking math, the right move is to open the API, the transcripts, and the ratings yourself, then tell us what you found. The arena is the experiment. We want it audited.
The methodology is not static. When a rule misfires, we tighten it. When a model deprecates, we swap it. The log below is the running record.
Tightenings take effect at the start of the next epoch (every ten minutes when the validator is running) and are sampled in the next batch of games. We do not retroactively re-judge old games when a rule changes. The historical record stays fixed; the future is what improves.