Building an AI Agent That Trades Prediction Markets
Prediction markets are, in a sense, the ideal playground for AI agents. They reduce complex real-world questions to binary or scalar outcomes with clear payoffs. They aggregate information from thousands of participants into a single market price. And they reward anyone whose probability estimates are more accurate than the consensus โ which is exactly what well-trained LLMs tend to be good at.
This tutorial builds a complete prediction market trading agent from scratch. By the end, you will have a running agent that queries open markets, uses an LLM to assess each question's probability, compares that assessment to the market price, places Kelly-sized bets where there's edge, and tracks performance over time.
How Prediction Markets Work
Prediction markets are order books where you buy shares in outcomes. On a binary market, you can buy YES or NO shares. Each share pays out exactly $1.00 if that outcome occurs and $0.00 if it does not. A YES share priced at $0.60 implies the market believes there is a 60% chance the event happens.
This price is the market's consensus probability estimate. If you believe the true probability is higher than 60%, YES shares are underpriced relative to their expected value. If you believe it is lower, NO shares are the play.
The key insight for AI agents: prediction markets are fundamentally information-processing competitions. The market price reflects the aggregate of all public information plus the relative skill of participants at interpreting that information. An AI agent with access to good data and a calibrated reasoning model can find systematic edges โ not through luck, but through genuinely better probability estimation.
What "edge" means in this context: Edge = your assessed probability minus the market price. A 10% edge means you believe the true probability is 10 percentage points higher than what the market is pricing. Over many bets, if your edge is real, you will profit. If your edge is imaginary, you will lose systematically.
Where AI Agents Find Edge
There are three sources of systematic edge that AI agents are well-positioned to exploit in prediction markets:
- Information synthesis speed: LLMs can rapidly process news articles, social media sentiment, on-chain data, and prior resolution patterns to form probability estimates. Human traders are slower at this synthesis, creating time windows where the market price lags new information.
- Calibration against known biases: Research shows prediction market participants are systematically overconfident โ they underprice tails and overprice near-certainties. A well-calibrated model can exploit these systematic biases with consistent positive expectation.
- Scale of market monitoring: A human trader might follow 20 markets closely. An AI agent can monitor 1,000+ simultaneously, finding the small subset where its edge is most significant and allocating capital accordingly. The law of large numbers compounds this advantage over time.
The edge is not guaranteed and will erode as more AI agents participate in these markets. The agents that win will be those with the best calibration, the most relevant data sources, and the most disciplined position sizing. This tutorial covers all three.
Step 1 โ Query Open Markets
Start by connecting to Purple Flea's prediction market API and pulling a filtered list of open markets. Target markets with enough liquidity to take meaningful positions, closing in the medium-term window where information half-life is long enough for the LLM's advantage to be meaningful, and sorted by 24-hour volume.
import purpleflea from datetime import datetime, timedelta pred = purpleflea.PredictionClient(api_key="YOUR_KEY") # Query markets suitable for LLM-based trading markets = pred.list_markets( min_liquidity_usd=5000, # enough to take $50+ positions closes_after=datetime.now() + timedelta(days=3), closes_before=datetime.now() + timedelta(days=90), categories=["crypto", "politics", "economics"], sort_by="volume_24h", limit=100 ) print(f"Found {len(markets)} tradeable markets") for m in markets[:5]: print( f"\n{m['question'][:70]}" f"\n YES={m['yes_price']:.2f} | NO={m['no_price']:.2f}" f" | liq=${m['liquidity_usd']:,.0f}" f" | closes {m['closes_at'][:10]}" )
Step 2 โ LLM Probability Assessment
For each market, prompt an LLM to estimate the probability of the YES outcome. The prompt structure matters enormously. You want the model to reason through available evidence, acknowledge uncertainty, and produce a calibrated numeric estimate. Prompts that just ask for a gut feeling produce uncalibrated outputs.
Provide the market consensus price in your prompt. This serves two purposes: it anchors the model's thinking to what the collective market believes, and it forces the model to justify why its estimate differs from a diverse group of participants who also have information about the same event.
import anthropic, json, re client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_KEY") def assess_probability(question: str, market_price: float) -> dict: prompt = f"""You are a calibrated probability assessor for prediction markets. Market question: {question} Current market consensus: {market_price:.1%} probability of YES Please estimate the probability of the YES outcome. Consider base rates, your knowledge of the domain, and whether you have genuine information advantage over the market consensus. Be conservative: if you are uncertain, shade toward the market price. Return JSON only: {{"probability": 0.XX, "confidence": "low|medium|high", "reasoning": "..."}}""" response = client.messages.create( model="claude-opus-4-6", max_tokens=400, messages=[{"role": "user", "content": prompt}] ) text = response.content[0].text match = re.search(r'\{[^}]+\}', text, re.DOTALL) return json.loads(match.group()) if match else None # Screen each market for edge opportunities = [] for market in markets: result = assess_probability(market["question"], market["yes_price"]) if result: edge = result["probability"] - market["yes_price"] opportunities.append({**market, **result, "edge": edge}) # Filter to actionable edges only high_edge = [o for o in opportunities if abs(o["edge"]) >= 0.10] print(f"{len(high_edge)} markets with 10%+ edge")
Step 3 โ Kelly Criterion Bet Sizing
Once you have an edge estimate, bet sizing determines whether you survive long enough to let the edge compound. The Kelly criterion gives you the mathematically optimal fraction of your bankroll to bet on each opportunity to maximize long-run growth rate.
For a binary prediction market: Kelly fraction = (p x b - q) / b, where p is your probability estimate, q = 1 - p, and b is the net odds (1 / market_price) - 1. Most practitioners use 25-50% of the full Kelly fraction because full Kelly produces high variance that often feels intolerable even when mathematically correct.
def kelly_fraction(p: float, market_price: float, scale: float = 0.25) -> float: """Fractional Kelly bet size (default 25% of full Kelly).""" q = 1 - p b = (1 / market_price) - 1 # net payout odds on YES full_kelly = (p * b - q) / b return max(0, full_kelly * scale) bankroll = 500.0 # $500 USDC starting capital for opp in sorted(high_edge, key=lambda x: abs(x["edge"]), reverse=True): # Skip low-confidence assessments if opp["confidence"] == "low": continue direction = "YES" if opp["edge"] > 0 else "NO" price = opp[f"{direction.lower()}_price"] frac = kelly_fraction(opp["probability"], price) size = min(bankroll * frac, 50) # hard cap per bet if size < 5 or bankroll < size: continue bet = pred.place_bet( market_id=opp["market_id"], outcome=direction, amount_usdc=round(size, 2), max_price=price * 1.02 # accept up to 2% slippage ) bankroll -= size print(f"{direction} ${size:.2f} edge={opp['edge']:+.1%} q={opp['question'][:50]}")
Step 4 โ The Full Agent Loop
Now stitch the pieces into a continuous loop. The agent wakes every 4 hours, checks for resolved positions and logs their outcomes against predictions, queries new markets, skips any market where it already holds a position, and places new bets on fresh opportunities.
import time, logging log = logging.getLogger("pred_agent") logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s") def run_cycle(bankroll: float) -> float: # 1. Collect resolved market outcomes resolved = pred.list_resolved(agent_id="agent_pred", days_back=1) for r in resolved: bankroll += r["pnl_usdc"] log.info(f"Resolved: {r['question'][:45]} PnL=${r['pnl_usdc']:+.2f}") # 2. Find open positions to avoid doubling up open_ids = {p["market_id"] for p in pred.get_positions(agent_id="agent_pred")} # 3. Pull fresh markets and filter to new ones markets = pred.list_markets(min_liquidity_usd=5000, sort_by="volume_24h", limit=80) fresh = [m for m in markets if m["market_id"] not in open_ids] log.info(f"{len(fresh)} new markets to assess") # 4. Assess and bet up to 30 markets per cycle (LLM cost control) placed = 0 for m in fresh[:30]: r = assess_probability(m["question"], m["yes_price"]) if not r or r["confidence"] == "low": continue edge = r["probability"] - m["yes_price"] if abs(edge) < 0.10: continue direction = "YES" if edge > 0 else "NO" price = m[f"{direction.lower()}_price"] size = min(bankroll * kelly_fraction(r["probability"], price), 50) if size >= 5 and bankroll >= size: pred.place_bet(market_id=m["market_id"], outcome=direction, amount_usdc=round(size, 2), max_price=price * 1.02) bankroll -= size placed += 1 log.info(f"Placed {placed} bets | Bankroll ${bankroll:.2f}") return bankroll bankroll = 500.0 while True: bankroll = run_cycle(bankroll) time.sleep(14400) # 4 hour cycle
Portfolio Management Across Many Markets
As your agent accumulates positions across many markets, portfolio-level thinking becomes important. You do not want heavy exposure to correlated markets โ if your agent holds YES positions on 10 different questions that all depend on whether the Fed cuts rates, that is one large bet dressed as 10 smaller ones.
Implement these guardrails in your agent loop:
- Category concentration limit: No more than 30% of total exposure in any single category (crypto, politics, sports, economics). Check portfolio concentration before placing each bet.
- Correlation screening: Before placing a new bet, compare its topic to existing open positions. If two questions share a primary resolution driver, treat them as the same bet for sizing purposes.
- Maximum simultaneous positions: Cap at 20โ30 positions to ensure your LLM assessments are high quality. More markets means faster recycling of the assessment budget but lower per-market accuracy.
- Resolution calendar smoothing: Avoid having more than 40% of positions resolving in the same week. Variance spikes when many bets resolve simultaneously.
Tracking Calibration and Improving Over Time
The ultimate measure of prediction quality is calibration: when your agent says 70% probability, does the outcome occur approximately 70% of the time? Good calibration is both the goal and the diagnostic signal for improving your system.
After accumulating 50+ resolved bets, compute a calibration curve by binning your probability predictions into deciles (0โ10%, 10โ20%, etc.) and measuring the actual win rate in each bin. Most LLM-based assessors start overconfident โ they predict 80% probability on events that only occur 62% of the time. Identifying and correcting these biases through prompt engineering significantly improves long-run returns.
Calibration tip: If your 70โ80% bucket has only a 55% actual win rate, add this instruction to your assessment prompt: "Be especially conservative when you are more than 15 percentage points away from the market consensus. Your confidence should require proportionally stronger evidence." This tends to compress overconfident predictions toward the market price, improving calibration at the extremes.
Prediction markets are one of the most direct ways for an AI agent to monetize its information synthesis advantage. The combination of a calibrated LLM, Kelly-based position sizing, portfolio diversification, and systematic calibration tracking creates a compounding edge over time. Start small, track every bet, measure calibration obsessively, and scale what works.