Cerebras + Purple Flea

The fastest AI inference meets real-time financial markets

Cerebras delivers 1000+ tokens per second — 25x faster than GPU-based inference. For trading agents, every millisecond matters. Pair Cerebras speed with Purple Flea's financial execution APIs.

Get Free API Key Read the Docs
1000+
Tokens/second
275
Perp Markets
6
Financial APIs
<1s
Decision Latency

Why inference speed matters for trading agents

Trading is a race. The faster your agent reasons, the better the price it gets. Cerebras's Wafer-Scale Engine delivers inference at speeds that make real-time market participation genuinely viable for AI agents.

The speed advantage is compounding

A Cerebras-based agent can make 25x more decisions per second than a standard GPU inference agent. Over a 24/7 trading operation, that means 25x more opportunities identified, 25x more positions evaluated, and 25x faster reaction to market-moving events.

Provider Model Speed (tokens/sec) Typical latency (100-token response)
Cerebras (Wafer-Scale Engine) Llama-3.3-70B
1,750 t/s
<60ms
Groq (LPU) Llama-3.3-70B
~800 t/s
~125ms
Together AI Llama-3.1-70B
~300 t/s
~330ms
OpenAI GPT-4o
~40 t/s
2,500ms
Anthropic Claude Sonnet 4.6
~30 t/s
3,300ms
Workflow: High-frequency decision loop with Cerebras
1

Market tick arrives t=0ms

Order book snapshot, last trade price, funding rate, and open interest are fetched from Purple Flea's market data endpoint. Data is packed into a prompt.

2

Cerebras inference t=60ms

Cerebras processes the 200-token prompt and returns a trading decision (direction, size, stop-loss, take-profit) in under 60ms — faster than a human blink.

3

Order placement t=80ms

Purple Flea routes the order to Hyperliquid. The entire cycle — market data to execution — completes in under 100ms from tick arrival.

4

Position monitoring + loop continuous

Cerebras monitors the open position at every tick. Trailing stop-loss, partial profit-taking, and regime change detection all happen in real-time without human oversight.

Cerebras + Purple Flea trading loop in Python

A high-frequency decision loop using Cerebras's OpenAI-compatible SDK and Purple Flea's trading API. Designed to process one decision per market tick.

cerebras_hf_trader.py Python
import time
import json
import requests
from cerebras.cloud.sdk import Cerebras

# Cerebras uses a familiar OpenAI-style client
cerebras_client = Cerebras(api_key="csk-your-key-here")
PF_API_KEY = "pf_live_your_key_here"
PF_BASE    = "https://purpleflea.com/api/v1"

SYSTEM_PROMPT = """You are a high-frequency crypto trading agent.
You receive real-time market data every second and must output a
JSON trading decision instantly. Format: {"action": "long"|"short"|"hold"|"close",
"confidence": 0-100, "size_pct": 0-20, "reasoning": "brief"}
Be decisive. Speed is the edge. Short reasoning only."""

def get_market_snapshot(market: str) -> dict:
    """Fetch live market data from Purple Flea."""
    resp = requests.get(
        f"{PF_BASE}/trading/market/{market}",
        headers={"X-API-Key": PF_API_KEY},
        timeout=5
    )
    return resp.json()

def cerebras_decide(snapshot: dict) -> dict:
    """Get a trading decision from Cerebras in <60ms."""
    t0 = time.perf_counter()

    resp = cerebras_client.chat.completions.create(
        model = "llama-3.3-70b",
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": (
                f"Market: {snapshot['market']}\n"
                f"Price: {snapshot['price']}\n"
                f"24h change: {snapshot['change_24h']}%\n"
                f"Funding rate: {snapshot['funding_rate']}%\n"
                f"Open interest: ${snapshot['open_interest']:,.0f}\n"
                f"Bid/Ask spread: {snapshot['spread_bps']} bps\n"
                "Decide: long, short, hold, or close?"
            )}
        ],
        max_tokens = 80,   # short response = faster
        temperature = 0.1
    )

    latency_ms = (time.perf_counter() - t0) * 1000
    content = resp.choices[0].message.content
    print(f"  Cerebras latency: {latency_ms:.0f}ms")

    try:
        return json.loads(content)
    except:
        return {"action": "hold", "confidence": 0}

def execute_decision(market: str, decision: dict,
                       wallet_balance: float) -> dict | None:
    """Execute the trade decision on Purple Flea."""
    if decision["action"] == "hold":
        return None
    if decision["confidence"] < 70:
        print("  Low confidence — skipping")
        return None

    size = wallet_balance * (decision["size_pct"] / 100)
    resp = requests.post(
        f"{PF_BASE}/trading/perp/open",
        headers={"X-API-Key": PF_API_KEY},
        json={
            "market":    market,
            "side":      decision["action"],  # "long" or "short"
            "size_usd":  round(size, 2),
            "leverage":  3,
            "order_type": "market"
        },
        timeout=3
    )
    return resp.json()

def run_hf_loop(market: str = "BTC-PERP", interval_s: float = 1.0):
    """High-frequency decision loop: one Cerebras call per tick."""
    balance_resp = requests.get(
        f"{PF_BASE}/wallet/balance",
        headers={"X-API-Key": PF_API_KEY}
    )
    balance = balance_resp.json()["usdc_balance"]
    print(f"[+] Starting HF loop | Market: {market} | Balance: ${balance:.2f}")

    while True:
        tick_start = time.perf_counter()
        snapshot   = get_market_snapshot(market)
        decision   = cerebras_decide(snapshot)

        print(f"  Action: {decision['action']} | Confidence: {decision['confidence']}%")
        result = execute_decision(market, decision, balance)
        if result:
            print(f"  Executed: {result}")

        elapsed = time.perf_counter() - tick_start
        time.sleep(max(0, interval_s - elapsed))

if __name__ == "__main__":
    run_hf_loop("BTC-PERP", interval_s=1.0)

6 financial APIs for your Cerebras agent

Cerebras gives you the fastest reasoning. Purple Flea gives you the markets to trade on. Together they form a complete high-frequency AI trading stack.

🎰
Casino
Provably fair crash, coin-flip, and dice games. USDC stakes. Cerebras can run hundreds of statistical decisions per minute for casino strategies.
POST /casino/bet
📈
Perpetual Trading
275 markets via Hyperliquid. Long and short, up to 20x leverage. Market and limit orders. Ideal for Cerebras's sub-100ms decision loops.
POST /trading/perp/open
👛
Multi-chain Wallets
ETH, BTC, SOL, TRX, XMR, XRP. Cerebras agents can manage complex multi-chain portfolios with instant rebalancing decisions at each market tick.
GET /wallet/balance
🌐
Domain Registration
Register .com, .ai, .io domains instantly. Cerebras agents can snipe newly expiring valuable domains faster than any human operator.
POST /domains/register
💧
Faucet
New agents claim $1 free USDC to start. Get funded, run the high-frequency loop, and scale up from there — zero upfront commitment.
POST /faucet/claim
🔒
Escrow
Trustless agent-to-agent payments at 1% fee. 15% referral commission. Fast-settling escrow for high-throughput agent payment flows.
POST /escrow/create

Other fast inference providers

Purple Flea works with all major AI inference providers — choose the speed-cost tradeoff that fits your agent.