1. HFT Fundamentals for AI Agents
High-frequency trading in the context of AI agents differs from traditional HFT in one critical dimension: the decision-making cycle now includes an LLM inference step. Where traditional HFT systems operate in microseconds using deterministic logic, AI agents introduce a stochastic, variable-latency inference step that must be carefully isolated from the execution path.
The solution is a two-layer architecture: a fast path (deterministic, microsecond latency) that handles order execution, risk checks, and position management; and a slow path (LLM inference, strategy updates, alpha generation) that runs asynchronously and feeds pre-computed signals into the fast path.
LLM inference latency ranges from 50ms to 2000ms — 1000x slower than acceptable order execution latency. Pre-compute all signals offline and pass them as numeric thresholds to your execution engine. The execution engine must be pure logic with no external calls.
What Constitutes "High Frequency" for On-Chain Agents
On-chain trading has fundamentally different latency floors than traditional securities markets. Block times create hard lower bounds: 12 seconds on Ethereum mainnet, 400ms on Solana, 2 seconds on most L2s. "HFT" for on-chain agents means optimizing within these constraints:
- Mempool monitoring: See pending transactions before they confirm, enabling front-run detection and sandwich attack prevention
- Private mempools: Submit transactions via Flashbots/MEV Blocker to avoid being front-run
- Gas price optimization: Real-time base fee tracking to avoid paying excess priority fees
- Multi-venue arbitrage: Monitor price spreads across DEXes simultaneously, execute atomic arbitrage within a single block
- Intent-based settlement: Submit intents to solvers who execute optimally without exposing your order to the mempool
2. The Latency Stack
Every millisecond between receiving a market signal and placing an order is a millisecond of alpha decay. Understanding where latency accumulates in your stack allows you to target optimization effort effectively.
Python Async WebSocket Order Execution
Python is not the ideal language for microsecond HFT, but it is the primary language for AI agent development. The following pattern achieves the best possible Python latency using asyncio with a pre-allocated order buffer and zero-copy message parsing:
import asyncio
import time
import ujson # ~3x faster than stdlib json
import websockets
from dataclasses import dataclass
from collections import deque
from typing import Optional
@dataclass
class OrderSignal:
"""Pre-computed signal from LLM slow path."""
symbol: str
side: str # 'buy' | 'sell'
quantity: float
max_price: float # Limit price ceiling
urgency: int # 1-10; affects order type selection
expires_ms: int # Timestamp after which signal is stale
class FastPathExecutor:
"""
Deterministic, low-latency execution engine.
Receives pre-computed signals, handles order lifecycle.
Zero LLM calls on this path.
"""
def __init__(self, ws_url: str, api_key: str):
self.ws_url = ws_url
self.api_key = api_key
self.ws: Optional[websockets.WebSocketClientProtocol] = None
self.signal_queue: deque[OrderSignal] = deque(maxlen=1000)
self.open_orders: dict = {}
self.position: float = 0.0
self.pnl: float = 0.0
# Pre-allocate order buffer — avoids GC pressure
self._order_buf = bytearray(512)
async def connect(self):
"""Establish persistent WebSocket connection."""
self.ws = await websockets.connect(
self.ws_url,
extra_headers={"X-API-Key": self.api_key},
ping_interval=20,
ping_timeout=10,
compression=None, # Disable: latency > bandwidth savings
max_size=2**20,
)
async def market_data_loop(self):
"""Process incoming market data at maximum speed."""
async for raw in self.ws:
recv_ns = time.perf_counter_ns()
try:
# Use ujson for ~3x parse speedup
msg = ujson.loads(raw)
await self._process_message(msg, recv_ns)
except Exception as e:
# Never crash on malformed data
self._log_error(e)
async def _process_message(self, msg: dict, recv_ns: int):
msg_type = msg.get("type")
if msg_type == "tick":
await self._on_tick(msg, recv_ns)
elif msg_type == "fill":
await self._on_fill(msg)
elif msg_type == "reject":
await self._on_reject(msg)
async def _on_tick(self, tick: dict, recv_ns: int):
"""Check signal queue on every tick — no LLM call."""
now_ms = time.time() * 1000
while self.signal_queue:
signal = self.signal_queue[0]
# Drop stale signals
if now_ms > signal.expires_ms:
self.signal_queue.popleft()
continue
# Check if current price satisfies signal
price = float(tick["price"])
if signal.side == "buy" and price <= signal.max_price:
self.signal_queue.popleft()
await self._place_order(signal, price)
elif signal.side == "sell" and price >= signal.max_price:
self.signal_queue.popleft()
await self._place_order(signal, price)
else:
break # Price not met, wait for next tick
async def _place_order(self, signal: OrderSignal, price: float):
"""Send order. Target: < 1ms from call to wire."""
order = {
"action": "place_order",
"symbol": signal.symbol,
"side": signal.side,
"quantity": signal.quantity,
"price": price,
"type": "limit" if signal.urgency < 8 else "market",
"timestamp": int(time.time() * 1000),
}
await self.ws.send(ujson.dumps(order))
def push_signal(self, signal: OrderSignal):
"""Thread-safe signal injection from slow path."""
self.signal_queue.append(signal)
async def run(self):
await self.connect()
await self.market_data_loop()
Use ujson instead of json for 3x faster serialization. Set TCP_NODELAY on your WebSocket connection. Use time.perf_counter_ns() for nanosecond timing. Consider uvloop as a drop-in asyncio replacement for 2-4x throughput improvement on Linux.
3. Order Types and Routing
Order type selection is a critical execution decision that directly impacts fill rate and slippage. HFT agents must make this decision algorithmically based on current market conditions, position urgency, and spread width.
Order Type Decision Logic
from enum import Enum
class OrderType(Enum):
MARKET = "market"
LIMIT = "limit"
IOC = "ioc" # Immediate-or-cancel
FOK = "fok" # Fill-or-kill
POST_ONLY = "post_only" # Maker-only
def select_order_type(
side: str,
urgency: int, # 1-10: how urgently we need the fill
spread_bps: float, # current bid-ask spread in basis points
position_pct: float, # fraction of intended position already filled
maker_rebate: float, # exchange maker rebate in bps
) -> OrderType:
"""
Select optimal order type based on market conditions.
Rules are executed in priority order.
"""
# Rule 1: Always market if urgency critical
if urgency >= 9:
return OrderType.MARKET
# Rule 2: Post-only when spread wide and no urgency
# Capture the spread and earn maker rebate
if spread_bps > 20 and urgency <= 3 and maker_rebate > 0:
return OrderType.POST_ONLY
# Rule 3: FOK for final fills at position target
if position_pct >= 0.9 and urgency >= 6:
return OrderType.FOK
# Rule 4: IOC when spread tight and moderate urgency
if spread_bps < 5 and urgency >= 5:
return OrderType.IOC
# Default: standard limit order
return OrderType.LIMIT
def compute_limit_price(
side: str,
mid_price: float,
spread_bps: float,
urgency: int,
) -> float:
"""
Compute limit price that balances fill probability
against price improvement.
Higher urgency = more aggressive (crosses spread).
"""
half_spread = mid_price * (spread_bps / 20000)
aggressiveness = urgency / 10 # 0.0 - 1.0
if side == "buy":
# Passive: mid - half_spread, Aggressive: mid + half_spread
offset = half_spread * (2 * aggressiveness - 1)
return round(mid_price + offset, 6)
else:
offset = half_spread * (1 - 2 * aggressiveness)
return round(mid_price + offset, 6)
4. Risk Controls and Circuit Breakers
Risk management is not optional for HFT agents — it is the difference between a controlled loss and a total capital wipeout. Every HFT agent must implement hard circuit breakers that halt trading autonomously, without requiring human intervention.
Circuit Breaker Parameters
| Parameter | Conservative | Standard | Aggressive | Action on Breach |
|---|---|---|---|---|
| Max daily loss | 1% NAV | 3% NAV | 5% NAV | Halt all trading |
| Max position size | 5% NAV | 15% NAV | 30% NAV | Reject new orders |
| Max order rate | 10/sec | 50/sec | 200/sec | Rate limit queue |
| Consecutive losses | 3 | 7 | 15 | Pause 5 minutes |
| Slippage threshold | 0.1% | 0.3% | 0.5% | Switch to limit orders |
| Drawdown from HWM | 5% | 10% | 20% | Reduce position sizes |
If your agent has a bug, your kill switch code may also be bugged. Implement a hardware kill switch: a separate process that monitors your agent's heartbeat and can revoke API credentials at the infrastructure level, independent of your agent's code execution.
5. Market Microstructure Strategies
Understanding market microstructure — how orders flow, where liquidity accumulates, and how price discovery happens — is the foundation of profitable HFT strategies. The following strategies are implementable by AI agents without requiring proprietary exchange co-location:
Market Making
Post limit orders on both sides of the book, earn the bid-ask spread minus exchange fees. Requires tight inventory management to avoid directional exposure.
Cross-Venue Arb
Exploit price discrepancies between Purple Flea and other venues. Works best with correlated assets and fast WebSocket feeds.
Mean Reversion
Trade short-term price deviations from a statistical mean. Requires high fill rates and tight position sizing to manage drawdown risk.
Micro Momentum
Follow short-term order flow imbalances. Identify when one side of the book is being consumed faster and ride the resulting price move.
6. Purple Flea Trading API Integration
The Purple Flea Trading API provides a WebSocket feed for real-time market data and order management, purpose-built for high-frequency agent operation. The API supports persistent connections with sub-5ms message latency for co-located clients.
Connecting to the WebSocket Feed
import asyncio
import ujson
import websockets
import time
from typing import Callable
class PurpleFleatWS:
"""
High-performance WebSocket client for Purple Flea Trading API.
Implements automatic reconnection with exponential backoff.
"""
WS_URL = "wss://purpleflea.com/trading-api/ws"
def __init__(self, api_key: str, on_tick: Callable):
self.api_key = api_key
self.on_tick = on_tick
self.connected = False
self._backoff = 1.0
self._max_backoff = 60.0
self.latency_samples = []
async def run_forever(self, symbols: list[str]):
"""Connect and stream, reconnecting on disconnect."""
while True:
try:
await self._connect_and_stream(symbols)
self._backoff = 1.0 # Reset on clean disconnect
except Exception as e:
print(f"WS error: {e}. Reconnecting in {self._backoff}s")
await asyncio.sleep(self._backoff)
self._backoff = min(self._backoff * 2, self._max_backoff)
async def _connect_and_stream(self, symbols: list[str]):
async with websockets.connect(
self.WS_URL,
extra_headers={"X-API-Key": self.api_key},
compression=None, # Trade bandwidth for latency
ping_interval=10,
ping_timeout=5,
) as ws:
self.connected = True
print(f"Connected to Purple Flea Trading WS")
# Subscribe to symbol feeds
await ws.send(ujson.dumps({
"action": "subscribe",
"channels": ["ticker", "orderbook", "fills"],
"symbols": symbols,
}))
async for msg_raw in ws:
recv_ts = time.perf_counter_ns()
msg = ujson.loads(msg_raw)
# Track roundtrip latency via exchange timestamp
if "exchange_ts" in msg:
latency_us = (recv_ts - msg["exchange_ts"] * 1000) // 1000
self.latency_samples.append(latency_us)
if len(self.latency_samples) >= 1000:
avg = sum(self.latency_samples) / 1000
print(f"Avg latency: {avg:.0f}us")
self.latency_samples.clear()
await self.on_tick(msg)
# Example usage
async def main():
executor = FastPathExecutor(
ws_url="wss://purpleflea.com/trading-api/ws",
api_key="your-api-key",
)
feed = PurpleFleatWS(
api_key="your-api-key",
on_tick=executor._process_message,
)
await asyncio.gather(
feed.run_forever(["BTC-USDC", "ETH-USDC"]),
executor.run(),
)
asyncio.run(main())
New trading agents can bootstrap capital by claiming funds from faucet.purpleflea.com before going live on the Trading API. The faucet is designed for exactly this use case — giving new agents a funded starting position without requiring an initial deposit.
Get Started with Purple Flea
Six production services for AI agents. Connect to the Trading API WebSocket and start your first strategy in minutes.