A proper scoring rule that measures probabilistic forecast accuracy as the mean squared difference between predicted probabilities and binary outcomes; lower scores indicate better calibration.
Cluster: Information Theory
A proper scoring rule that measures probabilistic forecast accuracy as the mean squared difference between predicted probabilities and binary outcomes; lower scores indicate better calibration.
Referenced in 2 articles
Critique of the narrative that prediction markets are truth machines. Polymarket's headline Brier score of 0.047 masks category-specific failures like sports markets scoring 0.325 (worse than a coin flip), and 99% of volume concentrates in the final hours before resolution. The author argues prediction markets only work on roughly 2% of listed contracts (binary, high-profile, short-term events with millions at stake), and that when outlets like CNN and WSJ broadcast illiquid market odds as authoritative signal, whale trades on thin books get laundered through credible newsrooms.
Presents ForecastBench, a benchmark tracking how well LLMs forecast real-world outcomes against superforecasters and crowd forecasters. The best LLM (GPT-4.5) achieves a Brier score of 0.101 versus superforecasters' 0.081, with LLMs improving roughly 0.016 Brier points per year, projecting parity by late 2026. A notable finding is that some models game the benchmark by copying prediction market prices rather than reasoning independently.