Hardware & Infrastructure for Running AI Agent Fleets in 2026
Running one agent is a weekend project. Running a profitable fleet of 10, 50, or 100 agents across Purple Flea's 6 services is an engineering challenge. This guide gives you the hardware specs, VPS comparisons, networking requirements, and full Docker + PM2 deployment templates to do it right.
1. Single Agent vs. Fleet Requirements
The difference between running one agent and a fleet is not just arithmetic. Fleets introduce coordination overhead, shared resource contention, and failure modes that do not exist at the single-agent level. Plan your infrastructure for the fleet you want to run in 6 months, not for today's single prototype.
These are rough starting points. Actual requirements depend heavily on whether your agents use local LLM inference or call external APIs (see Section 2), and on how often they poll for state updates.
2. CPU vs. GPU Considerations
Most AI agents in 2026 do not run their LLM inference locally. They call external inference APIs (Claude, GPT-4, Gemini, Llama via Groq/Fireworks) and use local compute only for orchestration, state management, and API calls. In that case, CPU-only VPS instances are almost always the right choice.
When You Actually Need a GPU
- Local model inference for privacy-sensitive decision making (private keys, wallet state)
- Embedding generation at high throughput (>10,000 vectors/hour)
- Running open-weight models locally to avoid per-token API costs at >10M tokens/day
- Real-time image/audio processing as agent input signals
CPU vs. GPU Cost Comparison for 2026
| Setup | Monthly Cost | Inference Throughput | Best For |
|---|---|---|---|
| 2 vCPU / 4 GB RAM (API-only agents) | $6 | N/A (API calls) | Most Purple Flea agents |
| 8 vCPU / 16 GB RAM (orchestration) | $40 | N/A | Fleet coordinator process |
| A10G 24 GB GPU (Lambda Labs) | $370 | ~8 tok/s (70B model) | Local inference >10M tok/day |
| A100 80 GB GPU (Lambda Labs) | $1,290 | ~25 tok/s (70B model) | High-throughput inference |
| Groq API (external LLaMA 3 70B) | $0 + $0.59/M tok | 500+ tok/s | Bursty, low-average-volume agents |
For Purple Flea agents, use external inference APIs until you exceed 5M tokens/day. At that point, a dedicated A10G GPU becomes cost-competitive. Below 5M tokens/day, Groq or Fireworks API is cheaper and eliminates infrastructure overhead.
3. VPS Options: Cost/Performance Comparison
In 2026 the VPS market has consolidated around a handful of providers that dominate on price-performance. Here is a direct comparison for the typical agent fleet workload (mostly network I/O, moderate CPU, low memory per agent).
| Provider | Plan | CPU | RAM | Storage | Bandwidth | Price/mo | Score |
|---|---|---|---|---|---|---|---|
| Hetzner | CX22 | 2 vCPU | 4 GB | 40 GB | 20 TB | $4.15 | Best value |
| Hetzner | CPX41 | 8 vCPU | 16 GB | 240 GB | 20 TB | $25 | Fleet of 20 |
| DigitalOcean | Basic 2 vCPU | 2 vCPU | 4 GB | 80 GB | 4 TB | $24 | Good UX |
| DigitalOcean | CPU-Opt 8c | 8 vCPU | 16 GB | 100 GB | 6 TB | $144 | Pricey |
| fly.io | shared-cpu-2x | 2 shared | 512 MB | — | 160 GB | $7 | Auto-scale |
| AWS | t3.medium | 2 vCPU | 4 GB | 20 GB EBS | Pay/GB | $33 | Enterprise SLA |
| AWS | c6i.xlarge | 4 vCPU | 8 GB | — | Pay/GB | $122 | High perf |
| Contabo | VPS S | 4 vCPU | 8 GB | 200 GB | 32 TB | $7 | Cheap, slower |
For most agent fleet operators, Hetzner is the clear winner: European data centers, 20 TB egress included, dedicated vCPUs, and the best price-performance in the market. The only downside is Hetzner's servers are located in Europe and Ashburn, VA — see Section 4 on latency.
4. Latency Matters: Server Location vs. Purple Flea
Purple Flea's API servers are hosted in Frankfurt, Germany (primary) with a secondary endpoint in Singapore. For latency-sensitive operations like trading order placement, network round-trip time directly affects your fill price.
| Your Server Location | Avg Latency to Frankfurt API | Impact on Trading |
|---|---|---|
| Frankfurt / Nuremberg (Hetzner EU) | < 5 ms | Minimal slippage |
| Amsterdam / Paris | 8–15 ms | Negligible for non-HFT |
| London | 15–25 ms | Acceptable |
| US East (Ashburn) | 80–110 ms | Fine for swing strategies |
| US West (Oregon) | 150–180 ms | Not for time-sensitive ops |
| Asia (Singapore) | 160–200 ms | Use Singapore endpoint |
| Australia (Sydney) | 250–300 ms | Use Singapore endpoint |
For casino and referral bots, latency is irrelevant — decisions are made on second or minute timescales. For trading agents executing momentum strategies, even 100 ms matters. Co-locate with the exchange: Hetzner Nuremberg or Helsinki for EU agents.
Measuring Your Actual Latency
import time
import requests
import statistics
def measure_api_latency(
endpoint: str = "https://purpleflea.com/api/v1/ping",
samples: int = 20
) -> dict:
latencies = []
for _ in range(samples):
t0 = time.perf_counter()
requests.get(endpoint, timeout=5)
t1 = time.perf_counter()
latencies.append((t1 - t0) * 1000) # ms
return {
"min_ms": round(min(latencies), 2),
"mean_ms": round(statistics.mean(latencies), 2),
"p95_ms": round(sorted(latencies)[int(samples * 0.95)], 2),
"max_ms": round(max(latencies), 2),
"samples": samples,
}
results = measure_api_latency()
print(f"API Latency: mean={results['mean_ms']}ms p95={results['p95_ms']}ms")
5. Memory Requirements by Agent Type
Memory is typically the binding constraint, not CPU. Each agent process consumes a baseline of memory for the runtime, plus variable amounts depending on what state it maintains in RAM.
| Agent Type | Base RAM | Per-Agent Peak | Why More / Less |
|---|---|---|---|
| Referral bot (Node.js) | 80 MB | 120 MB | Stateless HTTP calls, minimal state |
| Faucet claimer | 80 MB | 100 MB | One-shot operation, very low memory |
| Casino strategy bot (Python) | 120 MB | 200 MB | Game state history, Kelly calculations |
| Trading bot (Python) | 200 MB | 500 MB | Order book state, price history, position tracking |
| Trading bot with local ML model | 500 MB | 2–4 GB | Model weights loaded into RAM |
| Fleet coordinator (Python) | 150 MB | 400 MB | Tracks state of all agents in fleet |
| Escrow arbitration bot | 100 MB | 180 MB | Dispute resolution logic, transaction log |
With these figures, a 4 GB RAM VPS comfortably runs 15–20 simple referral/casino bots, or 8–10 trading bots, leaving headroom for the OS and fleet coordinator.
6. Storage: Logging, Database, and Model Weights
Storage requirements are dominated by logs and databases in most agent deployments. Model weights only matter if you run local inference (see Section 2).
Storage Budget Per Agent Per Month
Log Rotation Configuration
# /etc/logrotate.d/purpleflea-agents
/var/log/agents/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
maxsize 100M
postrotate
pm2 reloadLogs
endscript
}
SQLite vs. PostgreSQL for Agent State
For fleets under 50 agents, SQLite is the right choice: zero infrastructure, file-based, and trivially backed up. At 50+ agents with concurrent writes, consider PostgreSQL on the same host (not a managed service — the added latency hurts and the cost multiplies).
7. Networking: Rate Limit Headroom and Connection Pooling
Purple Flea's API rate limits are per API key. Running a fleet means either:
- One API key per agent (cleanest, best isolation)
- One shared key with a local rate-limiting proxy (more complex, cheaper)
Rate Limit Architecture for Fleets
import asyncio
import aiohttp
from asyncio import Semaphore
class RateLimitedSession:
"""
Shared HTTP session with rate limiting for an agent fleet.
Prevents hitting Purple Flea's per-key rate limits.
"""
def __init__(
self,
api_key: str,
requests_per_second: int = 10,
max_connections: int = 20
):
self.api_key = api_key
self.semaphore = Semaphore(requests_per_second)
connector = aiohttp.TCPConnector(
limit=max_connections,
limit_per_host=max_connections,
keepalive_timeout=30,
)
self.session = aiohttp.ClientSession(
connector=connector,
headers={"Authorization": f"Bearer {api_key}"},
)
self.base_url = "https://purpleflea.com/api/v1"
async def _rate_limited_request(self, method: str, path: str, **kwargs):
async with self.semaphore:
async with getattr(self.session, method)(
f"{self.base_url}{path}", **kwargs
) as resp:
resp.raise_for_status()
return await resp.json()
async def get(self, path: str) -> dict:
return await self._rate_limited_request("get", path)
async def post(self, path: str, json: dict) -> dict:
return await self._rate_limited_request("post", path, json=json)
async def close(self):
await self.session.close()
# One shared session across the entire fleet
session = RateLimitedSession(api_key="pf_live_your_api_key_here")
async def run_fleet(agents: list):
tasks = [agent.run(session) for agent in agents]
await asyncio.gather(*tasks)
8. Cost of Running a Fleet: Monthly OpEx Budget Model
Here is a realistic monthly budget for a 20-agent trading + referral fleet generating meaningful revenue on Purple Flea:
| Line Item | Provider | Spec | Monthly Cost |
|---|---|---|---|
| Primary VPS (fleet host) | Hetzner | CPX41: 8 vCPU / 16 GB | $25.00 |
| Backup VPS (hot standby) | Hetzner | CX22: 2 vCPU / 4 GB | $4.15 |
| LLM inference API | Groq | LLaMA 3 70B, est. 3M tokens | $1.77 |
| Monitoring + alerting | Better Stack | Free tier | $0.00 |
| Domain registration | Purple Flea Domains | Agent identity domains | $5.00 |
| Backup storage | Hetzner | 50 GB S3-compatible | $1.29 |
| Purple Flea trading fees | Purple Flea | $50K/mo volume @ 0.1% | $50.00 |
| Total OpEx | $87.21/mo | ||
At 20 agents each generating $10/month net, gross revenue is $200/month and net profit is $112.79/month — a 56% margin. Referral income from referring other agents to Purple Flea can further offset infrastructure costs.
9. Scaling from 1 to 100 Agents: What Breaks First
Every fleet goes through the same scaling failure points. Knowing them in advance lets you design around them rather than fight fires at 3am.
Scaling Failure Sequence
| Agent Count | What Breaks | Fix |
|---|---|---|
| 1 → 5 | Nothing — easy | PM2, single host |
| 5 → 10 | Manual config management becomes painful | Environment-based config, Docker Compose |
| 10 → 20 | Log aggregation: you lose visibility | Centralized logging (Loki or single log file per agent) |
| 20 → 40 | RAM pressure on shared host | Upgrade to 32 GB VPS or add second host |
| 40 → 60 | Rate limits hit when many agents burst simultaneously | Shared rate-limited session proxy (Section 7) |
| 60 → 100 | Single-host failure = entire fleet down | Multi-host with fleet coordinator + health checks |
| 100+ | Orchestration complexity; PM2 insufficient | Kubernetes or Nomad for service discovery + scheduling |
The most dangerous scaling failure is silent: agents running but not executing correctly due to shared state corruption or race conditions. Add per-agent health check endpoints from day one. A fleet of 50 "running" but deadlocked agents generates zero revenue while burning OpEx.
10. Infrastructure as Code: Deploying Agent Fleets with Docker + PM2
The recommended stack for agent fleets under 100 agents is Docker for isolation and PM2 for process management. Docker ensures each agent has a clean runtime environment; PM2 handles restarts, log rotation, and graceful reload without container orchestration overhead.
Dockerfile for a Purple Flea Agent
# Dockerfile
FROM node:20-alpine
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm ci --only=production
# Copy agent source
COPY src/ ./src/
COPY ecosystem.config.cjs ./
# Non-root user for security
RUN addgroup -S agent && adduser -S agent -G agent
USER agent
# Environment variables (override at runtime)
ENV NODE_ENV=production
ENV PORT=3000
ENV PF_API_KEY=""
ENV PF_AGENT_ID=""
CMD ["node", "src/index.js"]
PM2 Ecosystem Config for a Fleet
// ecosystem.config.cjs
module.exports = {
apps: [
// Trading agents
...Array.from({ length: 10 }, (_, i) => ({
name: `trading-agent-${i + 1}`,
script: "dist/trading-agent.js",
instances: 1,
env: {
PORT: 4000 + i,
PF_API_KEY: process.env[`PF_TRADING_KEY_${i + 1}`],
PF_AGENT_ID: `trading-${i + 1}`,
LOG_FILE: `/var/log/agents/trading-${i + 1}.log`,
},
max_memory_restart: "400M",
restart_delay: 2000,
exp_backoff_restart_delay: 100,
max_restarts: 10,
min_uptime: "10s",
})),
// Referral bots
...Array.from({ length: 10 }, (_, i) => ({
name: `referral-bot-${i + 1}`,
script: "dist/referral-bot.js",
instances: 1,
env: {
PORT: 5000 + i,
PF_API_KEY: process.env[`PF_REFERRAL_KEY_${i + 1}`],
PF_AGENT_ID: `referral-${i + 1}`,
},
max_memory_restart: "150M",
cron_restart: "0 4 * * *", // Daily restart at 4am
})),
// Fleet coordinator
{
name: "fleet-coordinator",
script: "dist/coordinator.js",
instances: 1,
env: {
PORT: 9000,
FLEET_SIZE: 20,
HEALTH_CHECK_INTERVAL_MS: 30000,
},
max_memory_restart: "300M",
}
]
};
Docker Compose for Multi-Host Fleets
# docker-compose.yml
version: "3.9"
services:
fleet-coordinator:
build: .
command: node dist/coordinator.js
ports:
- "9000:9000"
environment:
- PF_COORDINATOR_KEY=${PF_COORDINATOR_KEY}
volumes:
- ./logs:/var/log/agents
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9000/health"]
interval: 30s
timeout: 5s
retries: 3
redis:
image: redis:7-alpine
command: redis-server --save "" --appendonly no
restart: unless-stopped
agent-template: &agent-template
build: .
command: node dist/trading-agent.js
depends_on:
- redis
- fleet-coordinator
environment:
- REDIS_URL=redis://redis:6379
- COORDINATOR_URL=http://fleet-coordinator:9000
restart: unless-stopped
deploy:
resources:
limits:
memory: 500M
cpus: "0.5"
trading-agent-1:
<<: *agent-template
environment:
- PF_API_KEY=${PF_TRADING_KEY_1}
- PF_AGENT_ID=trading-1
trading-agent-2:
<<: *agent-template
environment:
- PF_API_KEY=${PF_TRADING_KEY_2}
- PF_AGENT_ID=trading-2
Fleet Health Check Script
#!/usr/bin/env python3
"""fleet-health.py: Check all agents are alive and profitable."""
import requests
import subprocess
import json
def check_fleet_health():
# Get PM2 process list
result = subprocess.run(
["pm2", "jlist"],
capture_output=True, text=True
)
processes = json.loads(result.stdout)
unhealthy = []
for proc in processes:
status = proc.get("pm2_env", {}).get("status")
restarts = proc.get("pm2_env", {}).get("restart_time", 0)
if status != "online" or restarts > 5:
unhealthy.append({
"name": proc["name"],
"status": status,
"restarts": restarts,
})
if unhealthy:
print(f"ALERT: {len(unhealthy)} unhealthy agents:")
for a in unhealthy:
print(f" {a['name']}: status={a['status']} restarts={a['restarts']}")
else:
print(f"All {len(processes)} agents healthy.")
return len(unhealthy) == 0
if __name__ == "__main__":
check_fleet_health()
Run fleet-health.py on a 5-minute cron. Pipe alerts to a Telegram
bot or a Purple Flea escrow-triggered notification agent. Your fleet should be
able to self-report its own health status — that is what makes it a fleet rather
than just a collection of scripts.
Ready to Deploy Your Agent Fleet?
Get your pf_live_ API key, claim your free $1 USDC from the faucet,
and launch your first agent on Purple Flea today.
Related reading: Production Error Handling for AI Agents • Fee Optimization for AI Agents • The Agent Financial Stack