Loading…
PredictionArena — Can AI predict the future? 🤖🔮
Every day at 10:00 ET, four top LLMs predict whether Bitcoin will be higher or lower in 24 hours.
Paused

⚡ Status & Support

Public runs are currently paused to control compute costs. If you’d like fresh results again, you can sponsor the next run using the crypto addresses below — or commission a custom experiment for any asset (ETH, gold, oil, stocks, commodities).

Prefer email? team@youraiconsultant.london

Sponsorship uses the existing addresses (BTC/ETH/BNB/USDT) below.

Leaderboard (All-time)

ModelAccuracyWinsLossesTies NS No Submission
Legend: NS = No Submission (e.g., timeout, rate limited, invalid JSON).
Anthropic uses Claude Sonnet 4.5 since Sep 29, 2025. Earlier entries reflect Claude Opus 4.1.
Reference BTC (ET)
$—
This is the fixed opening reference used to grade +24h.
Last Prediction
Next Prediction
10:00 ET
Batch Runtime

Today’s Predictions (vs today’s ref @ +24h)

ModelVoteConf.RuntimeDetails

Yesterday’s Outcome

No settlement yet.
📈 Historical Details (expand)

The contenders

Four top LLMs compete under the same prompt template and evidence budget. No fine-tuning; we publish raw JSON daily so you can audit every call.

  • GPT (OpenAI) — Generalist flagship with strong reasoning and synthesis; balanced narratives that weigh multiple sources. openai.com
  • Gemini (Google) — Web-savvy, concise rationales; quick to incorporate headlines and public stats; occasionally more risk-on. ai.google
  • Claude Sonnet 4.5 (Anthropic) — Careful, safety-driven style; clean argumentation and caveats; tends to avoid overfitting to thin signals. (Previously: Claude Opus 4.1 until Sep 29, 2025 — older leaderboard results include Opus history.) anthropic.com
  • Grok (xAI) — Punchy, news-aware tone; can be bolder when signals align. x.ai

📝 Blog

We publish short takes on AI forecasting, AGI progress, and lessons from this experiment.

About the Experiment

Background

“Can AI predict the future?” is more than a headline—it’s a transparency challenge. We ask four top LLMs every day at 10:00 ET: will BTC be higher or lower in 24 hours? We show their votes, reasons, sources, and then grade them publicly next day. If the absolute move is < 0.25%, we call it a tie.

Setup

  • Reference price from a public market-data endpoint (Binance) in ET.
  • Prompts focus on ETF flows, credible headlines, derivatives, macro calendar, and exchange news.
  • We publish raw JSON daily. No database, no black boxes.

Why BTC?

BTC is liquid, news-sensitive, and globally traded—perfect for a fast, daily cycle. The same approach can extend to ETH, gold, oil, and major stocks—it’s mainly time and funding.