7 min read ยท March 4, 2026

Building an AI Agent That Trades Prediction Markets

Prediction markets are, in a sense, the ideal playground for AI agents. They reduce complex real-world questions to binary or scalar outcomes with clear payoffs. They aggregate information from thousands of participants into a single market price. And they reward anyone whose probability estimates are more accurate than the consensus โ€” which is exactly what well-trained LLMs tend to be good at.

This tutorial builds a complete prediction market trading agent from scratch. By the end, you will have a running agent that queries open markets, uses an LLM to assess each question's probability, compares that assessment to the market price, places Kelly-sized bets where there's edge, and tracks performance over time.

In this tutorial
  1. How prediction markets work
  2. Where AI agents find edge
  3. Step 1 โ€” Query open markets
  4. Step 2 โ€” LLM probability assessment
  5. Step 3 โ€” Kelly criterion bet sizing
  6. Step 4 โ€” The full agent loop
  7. Portfolio management
  8. Calibration tracking

How Prediction Markets Work

Prediction markets are order books where you buy shares in outcomes. On a binary market, you can buy YES or NO shares. Each share pays out exactly $1.00 if that outcome occurs and $0.00 if it does not. A YES share priced at $0.60 implies the market believes there is a 60% chance the event happens.

This price is the market's consensus probability estimate. If you believe the true probability is higher than 60%, YES shares are underpriced relative to their expected value. If you believe it is lower, NO shares are the play.

The key insight for AI agents: prediction markets are fundamentally information-processing competitions. The market price reflects the aggregate of all public information plus the relative skill of participants at interpreting that information. An AI agent with access to good data and a calibrated reasoning model can find systematic edges โ€” not through luck, but through genuinely better probability estimation.

What "edge" means in this context: Edge = your assessed probability minus the market price. A 10% edge means you believe the true probability is 10 percentage points higher than what the market is pricing. Over many bets, if your edge is real, you will profit. If your edge is imaginary, you will lose systematically.

Where AI Agents Find Edge

There are three sources of systematic edge that AI agents are well-positioned to exploit in prediction markets:

The edge is not guaranteed and will erode as more AI agents participate in these markets. The agents that win will be those with the best calibration, the most relevant data sources, and the most disciplined position sizing. This tutorial covers all three.

Step 1 โ€” Query Open Markets

Start by connecting to Purple Flea's prediction market API and pulling a filtered list of open markets. Target markets with enough liquidity to take meaningful positions, closing in the medium-term window where information half-life is long enough for the LLM's advantage to be meaningful, and sorted by 24-hour volume.

Python
import purpleflea
from datetime import datetime, timedelta

pred = purpleflea.PredictionClient(api_key="YOUR_KEY")

# Query markets suitable for LLM-based trading
markets = pred.list_markets(
    min_liquidity_usd=5000,      # enough to take $50+ positions
    closes_after=datetime.now() + timedelta(days=3),
    closes_before=datetime.now() + timedelta(days=90),
    categories=["crypto", "politics", "economics"],
    sort_by="volume_24h",
    limit=100
)

print(f"Found {len(markets)} tradeable markets")
for m in markets[:5]:
    print(
        f"\n{m['question'][:70]}"
        f"\n  YES={m['yes_price']:.2f} | NO={m['no_price']:.2f}"
        f" | liq=${m['liquidity_usd']:,.0f}"
        f" | closes {m['closes_at'][:10]}"
    )

Step 2 โ€” LLM Probability Assessment

For each market, prompt an LLM to estimate the probability of the YES outcome. The prompt structure matters enormously. You want the model to reason through available evidence, acknowledge uncertainty, and produce a calibrated numeric estimate. Prompts that just ask for a gut feeling produce uncalibrated outputs.

Provide the market consensus price in your prompt. This serves two purposes: it anchors the model's thinking to what the collective market believes, and it forces the model to justify why its estimate differs from a diverse group of participants who also have information about the same event.

Python โ€” LLM Assessment
import anthropic, json, re

client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_KEY")

def assess_probability(question: str, market_price: float) -> dict:
    prompt = f"""You are a calibrated probability assessor for prediction markets.

Market question: {question}
Current market consensus: {market_price:.1%} probability of YES

Please estimate the probability of the YES outcome.
Consider base rates, your knowledge of the domain, and whether you have genuine
information advantage over the market consensus. Be conservative: if you are uncertain,
shade toward the market price.

Return JSON only: {{"probability": 0.XX, "confidence": "low|medium|high", "reasoning": "..."}}"""

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=400,
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.content[0].text
    match = re.search(r'\{[^}]+\}', text, re.DOTALL)
    return json.loads(match.group()) if match else None

# Screen each market for edge
opportunities = []
for market in markets:
    result = assess_probability(market["question"], market["yes_price"])
    if result:
        edge = result["probability"] - market["yes_price"]
        opportunities.append({**market, **result, "edge": edge})

# Filter to actionable edges only
high_edge = [o for o in opportunities if abs(o["edge"]) >= 0.10]
print(f"{len(high_edge)} markets with 10%+ edge")

Step 3 โ€” Kelly Criterion Bet Sizing

Once you have an edge estimate, bet sizing determines whether you survive long enough to let the edge compound. The Kelly criterion gives you the mathematically optimal fraction of your bankroll to bet on each opportunity to maximize long-run growth rate.

For a binary prediction market: Kelly fraction = (p x b - q) / b, where p is your probability estimate, q = 1 - p, and b is the net odds (1 / market_price) - 1. Most practitioners use 25-50% of the full Kelly fraction because full Kelly produces high variance that often feels intolerable even when mathematically correct.

Python โ€” Kelly Sizing and Bet Placement
def kelly_fraction(p: float, market_price: float, scale: float = 0.25) -> float:
    """Fractional Kelly bet size (default 25% of full Kelly)."""
    q = 1 - p
    b = (1 / market_price) - 1  # net payout odds on YES
    full_kelly = (p * b - q) / b
    return max(0, full_kelly * scale)

bankroll = 500.0  # $500 USDC starting capital

for opp in sorted(high_edge, key=lambda x: abs(x["edge"]), reverse=True):
    # Skip low-confidence assessments
    if opp["confidence"] == "low":
        continue

    direction = "YES" if opp["edge"] > 0 else "NO"
    price = opp[f"{direction.lower()}_price"]
    frac = kelly_fraction(opp["probability"], price)
    size = min(bankroll * frac, 50)  # hard cap per bet

    if size < 5 or bankroll < size:
        continue

    bet = pred.place_bet(
        market_id=opp["market_id"],
        outcome=direction,
        amount_usdc=round(size, 2),
        max_price=price * 1.02  # accept up to 2% slippage
    )
    bankroll -= size
    print(f"{direction} ${size:.2f} edge={opp['edge']:+.1%} q={opp['question'][:50]}")

Step 4 โ€” The Full Agent Loop

Now stitch the pieces into a continuous loop. The agent wakes every 4 hours, checks for resolved positions and logs their outcomes against predictions, queries new markets, skips any market where it already holds a position, and places new bets on fresh opportunities.

Python โ€” Agent Loop
import time, logging

log = logging.getLogger("pred_agent")
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")

def run_cycle(bankroll: float) -> float:
    # 1. Collect resolved market outcomes
    resolved = pred.list_resolved(agent_id="agent_pred", days_back=1)
    for r in resolved:
        bankroll += r["pnl_usdc"]
        log.info(f"Resolved: {r['question'][:45]} PnL=${r['pnl_usdc']:+.2f}")

    # 2. Find open positions to avoid doubling up
    open_ids = {p["market_id"] for p in pred.get_positions(agent_id="agent_pred")}

    # 3. Pull fresh markets and filter to new ones
    markets = pred.list_markets(min_liquidity_usd=5000, sort_by="volume_24h", limit=80)
    fresh = [m for m in markets if m["market_id"] not in open_ids]
    log.info(f"{len(fresh)} new markets to assess")

    # 4. Assess and bet up to 30 markets per cycle (LLM cost control)
    placed = 0
    for m in fresh[:30]:
        r = assess_probability(m["question"], m["yes_price"])
        if not r or r["confidence"] == "low": continue
        edge = r["probability"] - m["yes_price"]
        if abs(edge) < 0.10: continue
        direction = "YES" if edge > 0 else "NO"
        price = m[f"{direction.lower()}_price"]
        size = min(bankroll * kelly_fraction(r["probability"], price), 50)
        if size >= 5 and bankroll >= size:
            pred.place_bet(market_id=m["market_id"], outcome=direction,
                           amount_usdc=round(size, 2), max_price=price * 1.02)
            bankroll -= size
            placed += 1

    log.info(f"Placed {placed} bets | Bankroll ${bankroll:.2f}")
    return bankroll

bankroll = 500.0
while True:
    bankroll = run_cycle(bankroll)
    time.sleep(14400)  # 4 hour cycle

Portfolio Management Across Many Markets

As your agent accumulates positions across many markets, portfolio-level thinking becomes important. You do not want heavy exposure to correlated markets โ€” if your agent holds YES positions on 10 different questions that all depend on whether the Fed cuts rates, that is one large bet dressed as 10 smaller ones.

Implement these guardrails in your agent loop:

Tracking Calibration and Improving Over Time

The ultimate measure of prediction quality is calibration: when your agent says 70% probability, does the outcome occur approximately 70% of the time? Good calibration is both the goal and the diagnostic signal for improving your system.

After accumulating 50+ resolved bets, compute a calibration curve by binning your probability predictions into deciles (0โ€“10%, 10โ€“20%, etc.) and measuring the actual win rate in each bin. Most LLM-based assessors start overconfident โ€” they predict 80% probability on events that only occur 62% of the time. Identifying and correcting these biases through prompt engineering significantly improves long-run returns.

Calibration tip: If your 70โ€“80% bucket has only a 55% actual win rate, add this instruction to your assessment prompt: "Be especially conservative when you are more than 15 percentage points away from the market consensus. Your confidence should require proportionally stronger evidence." This tends to compress overconfident predictions toward the market price, improving calibration at the extremes.

Prediction markets are one of the most direct ways for an AI agent to monetize its information synthesis advantage. The combination of a calibrated LLM, Kelly-based position sizing, portfolio diversification, and systematic calibration tracking creates a compounding edge over time. Start small, track every bet, measure calibration obsessively, and scale what works.