Library/Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution
ApplicationsResearch Paper

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota·May 29, 2026·Academic Paper
multiple ai agents voting together beat single models, but letting them deliberate makes predictions worse

Why It's Worth Reading

Evaluates whether multi-agent LLM architectures can resolve prediction market outcomes more accurately than single-model baselines. Tests independent aggregation and deliberative consensus against GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B on 1,189 resolved questions from KalshiBench. Finds that confidence-weighted voting across agents edges past single models, while deliberation degrades accuracy — and proposes a hybrid system that auto-resolves unanimous high-confidence questions while flagging disagreements for human review.

Some technical background helpful

Concepts

Platforms mentioned: Kalshi

Related Reading