When stated probabilities match empirical frequencies—e.g., events given 70% odds happen 70% of the time.
Cluster: Information Theory
When stated probabilities match empirical frequencies—e.g., events given 70% odds happen 70% of the time.
Referenced in 8 articles
Data-driven analysis benchmarking Kalshi's sports markets against traditional sportsbooks. Kalshi's monthly volume grew 80x to $14.4B in March 2026, with NCAA March Madness generating $3.3B in notional volume comparable to total US wagered on the tournament. In-game prices correlate at 0.99+ with FanDuel, but Kalshi's taker fees (up to 3.5% at midpoint) and thinner in-game liquidity (76% depth decline vs pre-game) currently limit institutional execution. Includes a valuation comparison showing Kalshi priced as an exchange ($20B) vs sportsbooks trading at 2-4x revenue.
Critique of the narrative that prediction markets are truth machines. Polymarket's headline Brier score of 0.047 masks category-specific failures like sports markets scoring 0.325 (worse than a coin flip), and 99% of volume concentrates in the final hours before resolution. The author argues prediction markets only work on roughly 2% of listed contracts (binary, high-profile, short-term events with millions at stake), and that when outlets like CNN and WSJ broadcast illiquid market odds as authoritative signal, whale trades on thin books get laundered through credible newsrooms.
Proposes a workflow for using prediction market probabilities as inputs to equity valuation models. Walks through two case studies: translating Polymarket's 51% tariff refund probability into a 35% effective probability for Logitech's margin impact, and converting a 29.5% FDA approval probability into a $5.4B probability-weighted EV uplift for Eli Lilly. The key insight is that raw market probabilities must be adjusted for contract wording mismatches and economic relevance before they become useful for stock analysis.
Challenges the smart-versus-dumb money dichotomy in prediction markets by synthesizing research from Snowberg/Wolfers/Zitzewitz, INSEAD's BIN model, and Wharton's cognitive search framework. Argues that noisy traders fund the probability space rather than serve as exit liquidity, and compares how binary CLOBs versus continuous probability markets decompose and harness noise differently.
Reviews Philip Tetlock's Superforecasting and draws a direct line from the book's core thesis — that forecasting skill is measurable, trainable, and outperforms expert punditry — to Polymarket's success during the 2024 US election. Explains Tetlock's key concepts (foxes vs hedgehogs, the Good Judgment Project, Brier scores, calibration) and argues that Polymarket effectively operationalized Tetlock's framework at scale by converting crowd forecasting into a liquid financial market.
Fits a Bayesian hierarchical model to 292 million trades across 327,000 contracts on Kalshi and Polymarket to decompose calibration errors into structured components: universal horizon effects, domain-specific biases, and trade-size scale effects, which together explain 87.3% of variance on Kalshi. Finds persistent underconfidence in political markets where prices compress toward 50%, and shows that large trades amplify this pattern on Kalshi but not on Polymarket, pointing to platform-specific microstructure differences.
Questions whether prediction markets are capturing the right signal. Argues binary yes/no markets flatten complex beliefs into coin flips, losing the precision that separates superforecasters from average predictors. Uses the 2024 French trader whale ($30M moving election odds) and a Vanderbilt study (PredictIt's 93% accuracy vs 67% on high-volume platforms) to argue that more liquidity doesn't mean better signal.
Presents ForecastBench, a benchmark tracking how well LLMs forecast real-world outcomes against superforecasters and crowd forecasters. The best LLM (GPT-4.5) achieves a Brier score of 0.101 versus superforecasters' 0.081, with LLMs improving roughly 0.016 Brier points per year, projecting parity by late 2026. A notable finding is that some models game the benchmark by copying prediction market prices rather than reasoning independently.