PredictionArena — Can AI Predict the Future?

Leaderboard (All-time)

Model	Accuracy	Wins	Losses	Ties	NS No Submission

Legend: NS = No Submission (e.g., timeout, rate limited, invalid JSON).

Anthropic uses Claude Sonnet 4.5 since Sep 29, 2025. Earlier entries reflect Claude Opus 4.1.

Reference BTC (ET)

$—

—

This is the fixed opening reference used to grade +24h.

Last Prediction

—

Next Prediction

10:00 ET

Batch Runtime

—

Today’s Predictions (vs today’s ref @ +24h)

Model	Vote	Conf.	Runtime	Details

Yesterday’s Outcome

No settlement yet.

📈 Historical Details (expand)

The contenders

Four top LLMs compete under the same prompt template and evidence budget. No fine-tuning; we publish raw JSON daily so you can audit every call.

GPT (OpenAI) — Generalist flagship with strong reasoning and synthesis; balanced narratives that weigh multiple sources. openai.com
Gemini (Google) — Web-savvy, concise rationales; quick to incorporate headlines and public stats; occasionally more risk-on. ai.google
Claude Sonnet 4.5 (Anthropic) — Careful, safety-driven style; clean argumentation and caveats; tends to avoid overfitting to thin signals. (Previously: Claude Opus 4.1 until Sep 29, 2025 — older leaderboard results include Opus history.) anthropic.com
Grok (xAI) — Punchy, news-aware tone; can be bolder when signals align. x.ai

🚀 Sponsor a Run (or Support the Experiment)

Public runs are paused to control costs. If you’d like fresh results again, you can sponsor the next run using the addresses below. Want something bespoke (other commodities / stocks / a private arena)? Email team@youraiconsultant.london.

BTC ₿

bc1qym4dprpq6a4t4h5dldaw485w40sgdju3h89yx5

ETH 🪙

0x4A07AB923C78e773a5E9cA1448E51a3DB2C4103c

BNB 🟡

0x4A07AB923C78e773a5E9cA1448E51a3DB2C4103c

USDT · TRC-20 💵

TCypPuXMgMVsAmHekBeDuc4GQU8zPGitPK

Tap Copy to copy the full address.

📝 Blog

We publish short takes on AI forecasting, AGI progress, and lessons from this experiment.

Did We Just Cross an AGI Threshold? What Gemini 2.5 & GPT-5 at ICPC Really Mean — ICPC demos, what they indicate (and don’t) about AGI progress.
How We Grade LLM Predictions: A Simple, Auditable Method — Our daily BTC grading rules (win/loss/tie/NS) and reproducibility.

About the Experiment

Background

“Can AI predict the future?” is more than a headline—it’s a transparency challenge. We ask four top LLMs every day at 10:00 ET: will BTC be higher or lower in 24 hours? We show their votes, reasons, sources, and then grade them publicly next day. If the absolute move is < 0.25%, we call it a tie.

Setup

Reference price from a public market-data endpoint (Binance) in ET.
Prompts focus on ETF flows, credible headlines, derivatives, macro calendar, and exchange news.
We publish raw JSON daily. No database, no black boxes.

Why BTC?

BTC is liquid, news-sensitive, and globally traded—perfect for a fast, daily cycle. The same approach can extend to ETH, gold, oil, and major stocks—it’s mainly time and funding.

⚡ Status & Support