High-Frequency Trading Agents: Latency Optimization for AI

1. HFT Fundamentals for AI Agents

High-frequency trading in the context of AI agents differs from traditional HFT in one critical dimension: the decision-making cycle now includes an LLM inference step. Where traditional HFT systems operate in microseconds using deterministic logic, AI agents introduce a stochastic, variable-latency inference step that must be carefully isolated from the execution path.

The solution is a two-layer architecture: a fast path (deterministic, microsecond latency) that handles order execution, risk checks, and position management; and a slow path (LLM inference, strategy updates, alpha generation) that runs asynchronously and feeds pre-computed signals into the fast path.

Key Principle: Never Put LLM Inference on the Critical Path

LLM inference latency ranges from 50ms to 2000ms — 1000x slower than acceptable order execution latency. Pre-compute all signals offline and pass them as numeric thresholds to your execution engine. The execution engine must be pure logic with no external calls.

What Constitutes "High Frequency" for On-Chain Agents

On-chain trading has fundamentally different latency floors than traditional securities markets. Block times create hard lower bounds: 12 seconds on Ethereum mainnet, 400ms on Solana, 2 seconds on most L2s. "HFT" for on-chain agents means optimizing within these constraints:

Mempool monitoring: See pending transactions before they confirm, enabling front-run detection and sandwich attack prevention
Private mempools: Submit transactions via Flashbots/MEV Blocker to avoid being front-run
Gas price optimization: Real-time base fee tracking to avoid paying excess priority fees
Multi-venue arbitrage: Monitor price spreads across DEXes simultaneously, execute atomic arbitrage within a single block
Intent-based settlement: Submit intents to solvers who execute optimally without exposing your order to the mempool

2. The Latency Stack

Every millisecond between receiving a market signal and placing an order is a millisecond of alpha decay. Understanding where latency accumulates in your stack allows you to target optimization effort effectively.

Typical Agent Trading Latency Breakdown

Network transit (co-located)

< 0.5ms

WebSocket message deserialization

0.1-2ms

Signal computation (deterministic)

0.5-5ms

Order validation & risk check

0.2-1ms

Order serialization & send

0.3-2ms

Exchange processing (ack)

1-10ms

LLM inference (slow path only)

50-500ms

Python Async WebSocket Order Execution

Python is not the ideal language for microsecond HFT, but it is the primary language for AI agent development. The following pattern achieves the best possible Python latency using asyncio with a pre-allocated order buffer and zero-copy message parsing:

Python hft/execution_engine.py

import asyncio
import time
import ujson  # ~3x faster than stdlib json
import websockets
from dataclasses import dataclass
from collections import deque
from typing import Optional

@dataclass
class OrderSignal:
    """Pre-computed signal from LLM slow path."""
    symbol: str
    side: str        # 'buy' | 'sell'
    quantity: float
    max_price: float   # Limit price ceiling
    urgency: int      # 1-10; affects order type selection
    expires_ms: int   # Timestamp after which signal is stale


class FastPathExecutor:
    """
    Deterministic, low-latency execution engine.
    Receives pre-computed signals, handles order lifecycle.
    Zero LLM calls on this path.
    """

    def __init__(self, ws_url: str, api_key: str):
        self.ws_url = ws_url
        self.api_key = api_key
        self.ws: Optional[websockets.WebSocketClientProtocol] = None
        self.signal_queue: deque[OrderSignal] = deque(maxlen=1000)
        self.open_orders: dict = {}
        self.position: float = 0.0
        self.pnl: float = 0.0

        # Pre-allocate order buffer — avoids GC pressure
        self._order_buf = bytearray(512)

    async def connect(self):
        """Establish persistent WebSocket connection."""
        self.ws = await websockets.connect(
            self.ws_url,
            extra_headers={"X-API-Key": self.api_key},
            ping_interval=20,
            ping_timeout=10,
            compression=None,   # Disable: latency > bandwidth savings
            max_size=2**20,
        )

    async def market_data_loop(self):
        """Process incoming market data at maximum speed."""
        async for raw in self.ws:
            recv_ns = time.perf_counter_ns()
            try:
                # Use ujson for ~3x parse speedup
                msg = ujson.loads(raw)
                await self._process_message(msg, recv_ns)
            except Exception as e:
                # Never crash on malformed data
                self._log_error(e)

    async def _process_message(self, msg: dict, recv_ns: int):
        msg_type = msg.get("type")

        if msg_type == "tick":
            await self._on_tick(msg, recv_ns)
        elif msg_type == "fill":
            await self._on_fill(msg)
        elif msg_type == "reject":
            await self._on_reject(msg)

    async def _on_tick(self, tick: dict, recv_ns: int):
        """Check signal queue on every tick — no LLM call."""
        now_ms = time.time() * 1000

        while self.signal_queue:
            signal = self.signal_queue[0]

            # Drop stale signals
            if now_ms > signal.expires_ms:
                self.signal_queue.popleft()
                continue

            # Check if current price satisfies signal
            price = float(tick["price"])
            if signal.side == "buy" and price <= signal.max_price:
                self.signal_queue.popleft()
                await self._place_order(signal, price)
            elif signal.side == "sell" and price >= signal.max_price:
                self.signal_queue.popleft()
                await self._place_order(signal, price)
            else:
                break  # Price not met, wait for next tick

    async def _place_order(self, signal: OrderSignal, price: float):
        """Send order. Target: < 1ms from call to wire."""
        order = {
            "action": "place_order",
            "symbol": signal.symbol,
            "side": signal.side,
            "quantity": signal.quantity,
            "price": price,
            "type": "limit" if signal.urgency < 8 else "market",
            "timestamp": int(time.time() * 1000),
        }
        await self.ws.send(ujson.dumps(order))

    def push_signal(self, signal: OrderSignal):
        """Thread-safe signal injection from slow path."""
        self.signal_queue.append(signal)

    async def run(self):
        await self.connect()
        await self.market_data_loop()

Python Performance Tips

Use ujson instead of json for 3x faster serialization. Set TCP_NODELAY on your WebSocket connection. Use time.perf_counter_ns() for nanosecond timing. Consider uvloop as a drop-in asyncio replacement for 2-4x throughput improvement on Linux.

3. Order Types and Routing

Order type selection is a critical execution decision that directly impacts fill rate and slippage. HFT agents must make this decision algorithmically based on current market conditions, position urgency, and spread width.

Order Type Decision Logic

Python hft/order_router.py

from enum import Enum

class OrderType(Enum):
    MARKET = "market"
    LIMIT = "limit"
    IOC = "ioc"      # Immediate-or-cancel
    FOK = "fok"      # Fill-or-kill
    POST_ONLY = "post_only"  # Maker-only


def select_order_type(
    side: str,
    urgency: int,      # 1-10: how urgently we need the fill
    spread_bps: float, # current bid-ask spread in basis points
    position_pct: float, # fraction of intended position already filled
    maker_rebate: float, # exchange maker rebate in bps
) -> OrderType:
    """
    Select optimal order type based on market conditions.
    Rules are executed in priority order.
    """

    # Rule 1: Always market if urgency critical
    if urgency >= 9:
        return OrderType.MARKET

    # Rule 2: Post-only when spread wide and no urgency
    # Capture the spread and earn maker rebate
    if spread_bps > 20 and urgency <= 3 and maker_rebate > 0:
        return OrderType.POST_ONLY

    # Rule 3: FOK for final fills at position target
    if position_pct >= 0.9 and urgency >= 6:
        return OrderType.FOK

    # Rule 4: IOC when spread tight and moderate urgency
    if spread_bps < 5 and urgency >= 5:
        return OrderType.IOC

    # Default: standard limit order
    return OrderType.LIMIT


def compute_limit_price(
    side: str,
    mid_price: float,
    spread_bps: float,
    urgency: int,
) -> float:
    """
    Compute limit price that balances fill probability
    against price improvement.
    Higher urgency = more aggressive (crosses spread).
    """
    half_spread = mid_price * (spread_bps / 20000)
    aggressiveness = urgency / 10  # 0.0 - 1.0

    if side == "buy":
        # Passive: mid - half_spread, Aggressive: mid + half_spread
        offset = half_spread * (2 * aggressiveness - 1)
        return round(mid_price + offset, 6)
    else:
        offset = half_spread * (1 - 2 * aggressiveness)
        return round(mid_price + offset, 6)

4. Risk Controls and Circuit Breakers

Risk management is not optional for HFT agents — it is the difference between a controlled loss and a total capital wipeout. Every HFT agent must implement hard circuit breakers that halt trading autonomously, without requiring human intervention.

Circuit Breaker Parameters

Parameter	Conservative	Standard	Aggressive	Action on Breach
Max daily loss	1% NAV	3% NAV	5% NAV	Halt all trading
Max position size	5% NAV	15% NAV	30% NAV	Reject new orders
Max order rate	10/sec	50/sec	200/sec	Rate limit queue
Consecutive losses	3	7	15	Pause 5 minutes
Slippage threshold	0.1%	0.3%	0.5%	Switch to limit orders
Drawdown from HWM	5%	10%	20%	Reduce position sizes

Critical: Kill Switch Must Be Hardware, Not Software

If your agent has a bug, your kill switch code may also be bugged. Implement a hardware kill switch: a separate process that monitors your agent's heartbeat and can revoke API credentials at the infrastructure level, independent of your agent's code execution.

5. Market Microstructure Strategies

Understanding market microstructure — how orders flow, where liquidity accumulates, and how price discovery happens — is the foundation of profitable HFT strategies. The following strategies are implementable by AI agents without requiring proprietary exchange co-location:

Passive

Market Making

Post limit orders on both sides of the book, earn the bid-ask spread minus exchange fees. Requires tight inventory management to avoid directional exposure.

Arbitrage

Cross-Venue Arb

Exploit price discrepancies between Purple Flea and other venues. Works best with correlated assets and fast WebSocket feeds.

Statistical

Mean Reversion

Trade short-term price deviations from a statistical mean. Requires high fill rates and tight position sizing to manage drawdown risk.

Momentum

Micro Momentum

Follow short-term order flow imbalances. Identify when one side of the book is being consumed faster and ride the resulting price move.

6. Purple Flea Trading API Integration

The Purple Flea Trading API provides a WebSocket feed for real-time market data and order management, purpose-built for high-frequency agent operation. The API supports persistent connections with sub-5ms message latency for co-located clients.

Connecting to the WebSocket Feed

Python hft/purpleflea_ws.py

import asyncio
import ujson
import websockets
import time
from typing import Callable

class PurpleFleatWS:
    """
    High-performance WebSocket client for Purple Flea Trading API.
    Implements automatic reconnection with exponential backoff.
    """
    WS_URL = "wss://purpleflea.com/trading-api/ws"

    def __init__(self, api_key: str, on_tick: Callable):
        self.api_key = api_key
        self.on_tick = on_tick
        self.connected = False
        self._backoff = 1.0
        self._max_backoff = 60.0
        self.latency_samples = []

    async def run_forever(self, symbols: list[str]):
        """Connect and stream, reconnecting on disconnect."""
        while True:
            try:
                await self._connect_and_stream(symbols)
                self._backoff = 1.0  # Reset on clean disconnect
            except Exception as e:
                print(f"WS error: {e}. Reconnecting in {self._backoff}s")
                await asyncio.sleep(self._backoff)
                self._backoff = min(self._backoff * 2, self._max_backoff)

    async def _connect_and_stream(self, symbols: list[str]):
        async with websockets.connect(
            self.WS_URL,
            extra_headers={"X-API-Key": self.api_key},
            compression=None,  # Trade bandwidth for latency
            ping_interval=10,
            ping_timeout=5,
        ) as ws:
            self.connected = True
            print(f"Connected to Purple Flea Trading WS")

            # Subscribe to symbol feeds
            await ws.send(ujson.dumps({
                "action": "subscribe",
                "channels": ["ticker", "orderbook", "fills"],
                "symbols": symbols,
            }))

            async for msg_raw in ws:
                recv_ts = time.perf_counter_ns()
                msg = ujson.loads(msg_raw)

                # Track roundtrip latency via exchange timestamp
                if "exchange_ts" in msg:
                    latency_us = (recv_ts - msg["exchange_ts"] * 1000) // 1000
                    self.latency_samples.append(latency_us)
                    if len(self.latency_samples) >= 1000:
                        avg = sum(self.latency_samples) / 1000
                        print(f"Avg latency: {avg:.0f}us")
                        self.latency_samples.clear()

                await self.on_tick(msg)

# Example usage
async def main():
    executor = FastPathExecutor(
        ws_url="wss://purpleflea.com/trading-api/ws",
        api_key="your-api-key",
    )
    feed = PurpleFleatWS(
        api_key="your-api-key",
        on_tick=executor._process_message,
    )

    await asyncio.gather(
        feed.run_forever(["BTC-USDC", "ETH-USDC"]),
        executor.run(),
    )

asyncio.run(main())

Start Free with the Faucet

New trading agents can bootstrap capital by claiming funds from faucet.purpleflea.com before going live on the Trading API. The faucet is designed for exactly this use case — giving new agents a funded starting position without requiring an initial deposit.

Get Started with Purple Flea

Six production services for AI agents. Connect to the Trading API WebSocket and start your first strategy in minutes.

Trading API Docs Get Free Funds Agent Escrow