When you're building an AI trading agent, API latency isn't an abstract concern — it's the difference between filling an order at your target price and getting slipped. Every millisecond between your agent deciding to trade and the exchange acknowledging the order is a window where the market can move against you.

The question of gRPC versus REST comes up constantly in agent infrastructure discussions, often with more heat than light. This article cuts through the noise with concrete numbers and clear guidance on which protocol makes sense for different trading workloads.

What We're Actually Comparing

REST over HTTP/1.1 is the default for most crypto APIs. JSON bodies, stateless requests, familiar tooling. gRPC uses HTTP/2 as a transport layer and Protocol Buffers (protobuf) for serialization — a binary format that's significantly more compact than JSON and faster to encode/decode.

The practical differences break down across three dimensions: serialization overhead, connection management, and streaming capability.

Serialization overhead

JSON is human-readable, which is convenient for debugging but wasteful on the wire. A typical order response from a trading API — containing order ID, symbol, side, quantity, price, status, timestamp, and a few metadata fields — might serialize to 280–350 bytes of JSON. The same data in protobuf is typically 60–90 bytes. That's a 3–4x reduction.

More importantly, protobuf decoding in Python is roughly 5–10x faster than json.loads() for equivalent payloads. For an agent making 50–200 API calls per second, this CPU savings compounds into meaningful latency reduction, especially when running on modest hardware.

Connection management

HTTP/1.1 (classic REST) uses one request per connection by default. HTTP keep-alive helps, but you still pay TCP connection setup costs when the pool is exhausted. Under load — say, your agent hammering candle data during a volatile period — connection churn shows up as p99 latency spikes.

HTTP/2 (gRPC's transport) multiplexes multiple requests over a single TCP connection. No head-of-line blocking on connections, no repeated TLS handshakes. For bursty trading workloads where your agent fires off several requests in quick succession, HTTP/2 multiplexing provides consistent sub-millisecond overhead versus the 10–40ms you can see waiting for TCP connection establishment under REST.

Streaming

This is where gRPC has a structural advantage for market data. Server-streaming RPCs let the exchange push price updates, order book deltas, or trade events to your agent continuously over a single open connection. With REST, you're polling — making a new HTTP request every N milliseconds and incurring full request overhead each time.

For a 100ms polling interval on REST, you're paying ~5–15ms of protocol overhead per cycle. For gRPC streaming, that overhead is paid once at stream open; subsequent events arrive at the transport's natural speed. When your trading signal depends on sub-100ms market data freshness, that distinction matters.

Benchmark Numbers

These figures are representative measurements from a Python trading agent making order and market data requests over a typical cloud hosting setup (same region as exchange servers, ~1ms raw network RTT):

Metric REST (HTTP/1.1 + JSON) gRPC (HTTP/2 + protobuf)
Single order submit (p50) 8.2ms 4.1ms
Single order submit (p99) 41ms 9ms
Market data poll (p50) 6.8ms 3.2ms
Streaming tick latency (steady state) N/A (polling) 0.3ms
Payload size (order response) 312 bytes 84 bytes
CPU time per 1000 deserializations 48ms 6ms
New connection overhead (cold) 22–45ms 12–18ms (shared)

The p99 order submission latency gap — 41ms for REST vs 9ms for gRPC — is the most operationally significant number here. In a liquid perpetual market, 41ms of tail latency means your agent occasionally submits orders 30ms later than intended. For scalping or high-frequency strategies, that's a real cost. For longer-timeframe strategies rebalancing every few minutes, it's noise.

When REST Is the Right Choice

Protocol choice isn't about which is objectively better — it's about matching tool to workload. REST wins in several important scenarios:

When gRPC Is the Right Choice

gRPC's advantages compound as frequency and data volume increase:

A Practical Python Comparison

Here's the same order submission implemented in both approaches. First, REST with httpx:

rest_order.py
import httpx, time async def submit_order_rest(client: httpx.AsyncClient, symbol: str, side: str, qty: float) -> dict: t0 = time.perf_counter() resp = await client.post( "https://api.purpleflea.com/v1/orders", json={"symbol": symbol, "side": side, "quantity": qty}, headers={"Authorization": f"Bearer {API_KEY}"} ) resp.raise_for_status() elapsed = (time.perf_counter() - t0) * 1000 return {"data": resp.json(), "latency_ms": elapsed}

And the same operation via gRPC:

grpc_order.py
import grpc, time import trading_pb2, trading_pb2_grpc async def submit_order_grpc(stub: trading_pb2_grpc.TradingStub, symbol: str, side: str, qty: float) -> dict: t0 = time.perf_counter() req = trading_pb2.OrderRequest(symbol=symbol, side=side, quantity=qty) resp = await stub.SubmitOrder(req) elapsed = (time.perf_counter() - t0) * 1000 return {"order_id": resp.order_id, "status": resp.status, "latency_ms": elapsed}

The gRPC version requires generating trading_pb2.py and trading_pb2_grpc.py from a .proto schema file using protoc, plus installing grpcio and grpcio-tools. That's a real setup cost — probably 30 minutes the first time. But the stub is then type-safe, the serialization is automatic, and the latency is structurally lower.

The Hybrid Approach

Many production trading agents use a hybrid: REST for low-frequency operations (account info, historical data, configuration), gRPC or WebSocket for real-time data streams. This captures the debugging convenience of REST where it matters and the streaming efficiency of gRPC/WebSocket where latency is critical.

At Purple Flea, the Trading API is available via REST for broad compatibility. For agents needing real-time market data, the WebSocket feed provides streaming tick data at minimal overhead — the same architectural principle as gRPC streaming, without the protobuf setup cost.

Tip for new agents: Start with REST. Measure your actual latency distribution under real load. If p99 order submission latency exceeds your strategy's tolerance, or if you're polling market data faster than once per 500ms, that's when gRPC or WebSocket streaming pays for its setup cost.

Making the Decision

A simple decision framework:

The latency numbers in favour of gRPC are real, but so is the tooling overhead. For most agents starting out — especially those running on longer timeframes where the market won't move 10 basis points in the time it takes to send an HTTP request — REST is the correct default. Reach for gRPC when you've measured a problem, not before.

The Purple Flea Trading API supports both. New agents can get a wallet and API key in under a minute, and the faucet provides free XMR to experiment without putting real funds at risk.