Cerebras delivers 1000+ tokens per second — 25x faster than GPU-based inference. For trading agents, every millisecond matters. Pair Cerebras speed with Purple Flea's financial execution APIs.
Trading is a race. The faster your agent reasons, the better the price it gets. Cerebras's Wafer-Scale Engine delivers inference at speeds that make real-time market participation genuinely viable for AI agents.
A Cerebras-based agent can make 25x more decisions per second than a standard GPU inference agent. Over a 24/7 trading operation, that means 25x more opportunities identified, 25x more positions evaluated, and 25x faster reaction to market-moving events.
| Provider | Model | Speed (tokens/sec) | Typical latency (100-token response) |
|---|---|---|---|
| Cerebras (Wafer-Scale Engine) | Llama-3.3-70B | <60ms | |
| Groq (LPU) | Llama-3.3-70B | ~125ms | |
| Together AI | Llama-3.1-70B | ~330ms | |
| OpenAI | GPT-4o | 2,500ms | |
| Anthropic | Claude Sonnet 4.6 | 3,300ms |
Order book snapshot, last trade price, funding rate, and open interest are fetched from Purple Flea's market data endpoint. Data is packed into a prompt.
Cerebras processes the 200-token prompt and returns a trading decision (direction, size, stop-loss, take-profit) in under 60ms — faster than a human blink.
Purple Flea routes the order to Hyperliquid. The entire cycle — market data to execution — completes in under 100ms from tick arrival.
Cerebras monitors the open position at every tick. Trailing stop-loss, partial profit-taking, and regime change detection all happen in real-time without human oversight.
A high-frequency decision loop using Cerebras's OpenAI-compatible SDK and Purple Flea's trading API. Designed to process one decision per market tick.
import time import json import requests from cerebras.cloud.sdk import Cerebras # Cerebras uses a familiar OpenAI-style client cerebras_client = Cerebras(api_key="csk-your-key-here") PF_API_KEY = "pf_live_your_key_here" PF_BASE = "https://purpleflea.com/api/v1" SYSTEM_PROMPT = """You are a high-frequency crypto trading agent. You receive real-time market data every second and must output a JSON trading decision instantly. Format: {"action": "long"|"short"|"hold"|"close", "confidence": 0-100, "size_pct": 0-20, "reasoning": "brief"} Be decisive. Speed is the edge. Short reasoning only.""" def get_market_snapshot(market: str) -> dict: """Fetch live market data from Purple Flea.""" resp = requests.get( f"{PF_BASE}/trading/market/{market}", headers={"X-API-Key": PF_API_KEY}, timeout=5 ) return resp.json() def cerebras_decide(snapshot: dict) -> dict: """Get a trading decision from Cerebras in <60ms.""" t0 = time.perf_counter() resp = cerebras_client.chat.completions.create( model = "llama-3.3-70b", messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": ( f"Market: {snapshot['market']}\n" f"Price: {snapshot['price']}\n" f"24h change: {snapshot['change_24h']}%\n" f"Funding rate: {snapshot['funding_rate']}%\n" f"Open interest: ${snapshot['open_interest']:,.0f}\n" f"Bid/Ask spread: {snapshot['spread_bps']} bps\n" "Decide: long, short, hold, or close?" )} ], max_tokens = 80, # short response = faster temperature = 0.1 ) latency_ms = (time.perf_counter() - t0) * 1000 content = resp.choices[0].message.content print(f" Cerebras latency: {latency_ms:.0f}ms") try: return json.loads(content) except: return {"action": "hold", "confidence": 0} def execute_decision(market: str, decision: dict, wallet_balance: float) -> dict | None: """Execute the trade decision on Purple Flea.""" if decision["action"] == "hold": return None if decision["confidence"] < 70: print(" Low confidence — skipping") return None size = wallet_balance * (decision["size_pct"] / 100) resp = requests.post( f"{PF_BASE}/trading/perp/open", headers={"X-API-Key": PF_API_KEY}, json={ "market": market, "side": decision["action"], # "long" or "short" "size_usd": round(size, 2), "leverage": 3, "order_type": "market" }, timeout=3 ) return resp.json() def run_hf_loop(market: str = "BTC-PERP", interval_s: float = 1.0): """High-frequency decision loop: one Cerebras call per tick.""" balance_resp = requests.get( f"{PF_BASE}/wallet/balance", headers={"X-API-Key": PF_API_KEY} ) balance = balance_resp.json()["usdc_balance"] print(f"[+] Starting HF loop | Market: {market} | Balance: ${balance:.2f}") while True: tick_start = time.perf_counter() snapshot = get_market_snapshot(market) decision = cerebras_decide(snapshot) print(f" Action: {decision['action']} | Confidence: {decision['confidence']}%") result = execute_decision(market, decision, balance) if result: print(f" Executed: {result}") elapsed = time.perf_counter() - tick_start time.sleep(max(0, interval_s - elapsed)) if __name__ == "__main__": run_hf_loop("BTC-PERP", interval_s=1.0)
Cerebras gives you the fastest reasoning. Purple Flea gives you the markets to trade on. Together they form a complete high-frequency AI trading stack.
Purple Flea works with all major AI inference providers — choose the speed-cost tradeoff that fits your agent.