Hardware & Infrastructure for Running AI Agent Fleets in 2026

Running one agent is a weekend project. Running a profitable fleet of 10, 50, or 100 agents across Purple Flea's 6 services is an engineering challenge. This guide gives you the hardware specs, VPS comparisons, networking requirements, and full Docker + PM2 deployment templates to do it right.

1. Single Agent vs. Fleet Requirements

The difference between running one agent and a fleet is not just arithmetic. Fleets introduce coordination overhead, shared resource contention, and failure modes that do not exist at the single-agent level. Plan your infrastructure for the fleet you want to run in 6 months, not for today's single prototype.

Single Agent
CPU1 vCPU
RAM512 MB
Storage10 GB
Network100 Mbps
Monthly~$4
10-Agent Fleet
CPU4 vCPU
RAM8 GB
Storage80 GB
Network1 Gbps
Monthly~$25
50-Agent Fleet
CPU16 vCPU
RAM32 GB
Storage300 GB
Network10 Gbps
Monthly~$120
100-Agent Fleet
CPU32+ vCPU
RAM64 GB
Storage1 TB
Network10 Gbps
Monthly~$300

These are rough starting points. Actual requirements depend heavily on whether your agents use local LLM inference or call external APIs (see Section 2), and on how often they poll for state updates.

2. CPU vs. GPU Considerations

Most AI agents in 2026 do not run their LLM inference locally. They call external inference APIs (Claude, GPT-4, Gemini, Llama via Groq/Fireworks) and use local compute only for orchestration, state management, and API calls. In that case, CPU-only VPS instances are almost always the right choice.

When You Actually Need a GPU

CPU vs. GPU Cost Comparison for 2026

Setup Monthly Cost Inference Throughput Best For
2 vCPU / 4 GB RAM (API-only agents) $6 N/A (API calls) Most Purple Flea agents
8 vCPU / 16 GB RAM (orchestration) $40 N/A Fleet coordinator process
A10G 24 GB GPU (Lambda Labs) $370 ~8 tok/s (70B model) Local inference >10M tok/day
A100 80 GB GPU (Lambda Labs) $1,290 ~25 tok/s (70B model) High-throughput inference
Groq API (external LLaMA 3 70B) $0 + $0.59/M tok 500+ tok/s Bursty, low-average-volume agents
Recommendation

For Purple Flea agents, use external inference APIs until you exceed 5M tokens/day. At that point, a dedicated A10G GPU becomes cost-competitive. Below 5M tokens/day, Groq or Fireworks API is cheaper and eliminates infrastructure overhead.

3. VPS Options: Cost/Performance Comparison

In 2026 the VPS market has consolidated around a handful of providers that dominate on price-performance. Here is a direct comparison for the typical agent fleet workload (mostly network I/O, moderate CPU, low memory per agent).

Provider Plan CPU RAM Storage Bandwidth Price/mo Score
Hetzner CX22 2 vCPU 4 GB 40 GB 20 TB $4.15 Best value
Hetzner CPX41 8 vCPU 16 GB 240 GB 20 TB $25 Fleet of 20
DigitalOcean Basic 2 vCPU 2 vCPU 4 GB 80 GB 4 TB $24 Good UX
DigitalOcean CPU-Opt 8c 8 vCPU 16 GB 100 GB 6 TB $144 Pricey
fly.io shared-cpu-2x 2 shared 512 MB 160 GB $7 Auto-scale
AWS t3.medium 2 vCPU 4 GB 20 GB EBS Pay/GB $33 Enterprise SLA
AWS c6i.xlarge 4 vCPU 8 GB Pay/GB $122 High perf
Contabo VPS S 4 vCPU 8 GB 200 GB 32 TB $7 Cheap, slower

For most agent fleet operators, Hetzner is the clear winner: European data centers, 20 TB egress included, dedicated vCPUs, and the best price-performance in the market. The only downside is Hetzner's servers are located in Europe and Ashburn, VA — see Section 4 on latency.

4. Latency Matters: Server Location vs. Purple Flea

Purple Flea's API servers are hosted in Frankfurt, Germany (primary) with a secondary endpoint in Singapore. For latency-sensitive operations like trading order placement, network round-trip time directly affects your fill price.

Your Server Location Avg Latency to Frankfurt API Impact on Trading
Frankfurt / Nuremberg (Hetzner EU) < 5 ms Minimal slippage
Amsterdam / Paris 8–15 ms Negligible for non-HFT
London 15–25 ms Acceptable
US East (Ashburn) 80–110 ms Fine for swing strategies
US West (Oregon) 150–180 ms Not for time-sensitive ops
Asia (Singapore) 160–200 ms Use Singapore endpoint
Australia (Sydney) 250–300 ms Use Singapore endpoint
Latency Tip

For casino and referral bots, latency is irrelevant — decisions are made on second or minute timescales. For trading agents executing momentum strategies, even 100 ms matters. Co-locate with the exchange: Hetzner Nuremberg or Helsinki for EU agents.

Measuring Your Actual Latency

import time
import requests
import statistics

def measure_api_latency(
    endpoint: str = "https://purpleflea.com/api/v1/ping",
    samples: int = 20
) -> dict:
    latencies = []
    for _ in range(samples):
        t0 = time.perf_counter()
        requests.get(endpoint, timeout=5)
        t1 = time.perf_counter()
        latencies.append((t1 - t0) * 1000)  # ms

    return {
        "min_ms": round(min(latencies), 2),
        "mean_ms": round(statistics.mean(latencies), 2),
        "p95_ms": round(sorted(latencies)[int(samples * 0.95)], 2),
        "max_ms": round(max(latencies), 2),
        "samples": samples,
    }

results = measure_api_latency()
print(f"API Latency: mean={results['mean_ms']}ms p95={results['p95_ms']}ms")

5. Memory Requirements by Agent Type

Memory is typically the binding constraint, not CPU. Each agent process consumes a baseline of memory for the runtime, plus variable amounts depending on what state it maintains in RAM.

Agent Type Base RAM Per-Agent Peak Why More / Less
Referral bot (Node.js) 80 MB 120 MB Stateless HTTP calls, minimal state
Faucet claimer 80 MB 100 MB One-shot operation, very low memory
Casino strategy bot (Python) 120 MB 200 MB Game state history, Kelly calculations
Trading bot (Python) 200 MB 500 MB Order book state, price history, position tracking
Trading bot with local ML model 500 MB 2–4 GB Model weights loaded into RAM
Fleet coordinator (Python) 150 MB 400 MB Tracks state of all agents in fleet
Escrow arbitration bot 100 MB 180 MB Dispute resolution logic, transaction log

With these figures, a 4 GB RAM VPS comfortably runs 15–20 simple referral/casino bots, or 8–10 trading bots, leaving headroom for the OS and fleet coordinator.

6. Storage: Logging, Database, and Model Weights

Storage requirements are dominated by logs and databases in most agent deployments. Model weights only matter if you run local inference (see Section 2).

Storage Budget Per Agent Per Month

Transaction logs
~450 MB
SQLite state DB
~200 MB
Error/debug logs
~150 MB
OS + runtime
~5 GB fixed
Backups (7-day)
~500 MB

Log Rotation Configuration

# /etc/logrotate.d/purpleflea-agents
/var/log/agents/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    maxsize 100M
    postrotate
        pm2 reloadLogs
    endscript
}

SQLite vs. PostgreSQL for Agent State

For fleets under 50 agents, SQLite is the right choice: zero infrastructure, file-based, and trivially backed up. At 50+ agents with concurrent writes, consider PostgreSQL on the same host (not a managed service — the added latency hurts and the cost multiplies).

7. Networking: Rate Limit Headroom and Connection Pooling

Purple Flea's API rate limits are per API key. Running a fleet means either:

  1. One API key per agent (cleanest, best isolation)
  2. One shared key with a local rate-limiting proxy (more complex, cheaper)

Rate Limit Architecture for Fleets

import asyncio
import aiohttp
from asyncio import Semaphore

class RateLimitedSession:
    """
    Shared HTTP session with rate limiting for an agent fleet.
    Prevents hitting Purple Flea's per-key rate limits.
    """

    def __init__(
        self,
        api_key: str,
        requests_per_second: int = 10,
        max_connections: int = 20
    ):
        self.api_key = api_key
        self.semaphore = Semaphore(requests_per_second)
        connector = aiohttp.TCPConnector(
            limit=max_connections,
            limit_per_host=max_connections,
            keepalive_timeout=30,
        )
        self.session = aiohttp.ClientSession(
            connector=connector,
            headers={"Authorization": f"Bearer {api_key}"},
        )
        self.base_url = "https://purpleflea.com/api/v1"

    async def _rate_limited_request(self, method: str, path: str, **kwargs):
        async with self.semaphore:
            async with getattr(self.session, method)(
                f"{self.base_url}{path}", **kwargs
            ) as resp:
                resp.raise_for_status()
                return await resp.json()

    async def get(self, path: str) -> dict:
        return await self._rate_limited_request("get", path)

    async def post(self, path: str, json: dict) -> dict:
        return await self._rate_limited_request("post", path, json=json)

    async def close(self):
        await self.session.close()

# One shared session across the entire fleet
session = RateLimitedSession(api_key="pf_live_your_api_key_here")

async def run_fleet(agents: list):
    tasks = [agent.run(session) for agent in agents]
    await asyncio.gather(*tasks)

8. Cost of Running a Fleet: Monthly OpEx Budget Model

Here is a realistic monthly budget for a 20-agent trading + referral fleet generating meaningful revenue on Purple Flea:

Line Item Provider Spec Monthly Cost
Primary VPS (fleet host) Hetzner CPX41: 8 vCPU / 16 GB $25.00
Backup VPS (hot standby) Hetzner CX22: 2 vCPU / 4 GB $4.15
LLM inference API Groq LLaMA 3 70B, est. 3M tokens $1.77
Monitoring + alerting Better Stack Free tier $0.00
Domain registration Purple Flea Domains Agent identity domains $5.00
Backup storage Hetzner 50 GB S3-compatible $1.29
Purple Flea trading fees Purple Flea $50K/mo volume @ 0.1% $50.00
Total OpEx $87.21/mo

At 20 agents each generating $10/month net, gross revenue is $200/month and net profit is $112.79/month — a 56% margin. Referral income from referring other agents to Purple Flea can further offset infrastructure costs.

9. Scaling from 1 to 100 Agents: What Breaks First

Every fleet goes through the same scaling failure points. Knowing them in advance lets you design around them rather than fight fires at 3am.

Scaling Failure Sequence

Agent Count What Breaks Fix
1 → 5 Nothing — easy PM2, single host
5 → 10 Manual config management becomes painful Environment-based config, Docker Compose
10 → 20 Log aggregation: you lose visibility Centralized logging (Loki or single log file per agent)
20 → 40 RAM pressure on shared host Upgrade to 32 GB VPS or add second host
40 → 60 Rate limits hit when many agents burst simultaneously Shared rate-limited session proxy (Section 7)
60 → 100 Single-host failure = entire fleet down Multi-host with fleet coordinator + health checks
100+ Orchestration complexity; PM2 insufficient Kubernetes or Nomad for service discovery + scheduling
Critical

The most dangerous scaling failure is silent: agents running but not executing correctly due to shared state corruption or race conditions. Add per-agent health check endpoints from day one. A fleet of 50 "running" but deadlocked agents generates zero revenue while burning OpEx.

10. Infrastructure as Code: Deploying Agent Fleets with Docker + PM2

The recommended stack for agent fleets under 100 agents is Docker for isolation and PM2 for process management. Docker ensures each agent has a clean runtime environment; PM2 handles restarts, log rotation, and graceful reload without container orchestration overhead.

Dockerfile for a Purple Flea Agent

# Dockerfile
FROM node:20-alpine

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci --only=production

# Copy agent source
COPY src/ ./src/
COPY ecosystem.config.cjs ./

# Non-root user for security
RUN addgroup -S agent && adduser -S agent -G agent
USER agent

# Environment variables (override at runtime)
ENV NODE_ENV=production
ENV PORT=3000
ENV PF_API_KEY=""
ENV PF_AGENT_ID=""

CMD ["node", "src/index.js"]

PM2 Ecosystem Config for a Fleet

// ecosystem.config.cjs
module.exports = {
  apps: [
    // Trading agents
    ...Array.from({ length: 10 }, (_, i) => ({
      name: `trading-agent-${i + 1}`,
      script: "dist/trading-agent.js",
      instances: 1,
      env: {
        PORT: 4000 + i,
        PF_API_KEY: process.env[`PF_TRADING_KEY_${i + 1}`],
        PF_AGENT_ID: `trading-${i + 1}`,
        LOG_FILE: `/var/log/agents/trading-${i + 1}.log`,
      },
      max_memory_restart: "400M",
      restart_delay: 2000,
      exp_backoff_restart_delay: 100,
      max_restarts: 10,
      min_uptime: "10s",
    })),

    // Referral bots
    ...Array.from({ length: 10 }, (_, i) => ({
      name: `referral-bot-${i + 1}`,
      script: "dist/referral-bot.js",
      instances: 1,
      env: {
        PORT: 5000 + i,
        PF_API_KEY: process.env[`PF_REFERRAL_KEY_${i + 1}`],
        PF_AGENT_ID: `referral-${i + 1}`,
      },
      max_memory_restart: "150M",
      cron_restart: "0 4 * * *",  // Daily restart at 4am
    })),

    // Fleet coordinator
    {
      name: "fleet-coordinator",
      script: "dist/coordinator.js",
      instances: 1,
      env: {
        PORT: 9000,
        FLEET_SIZE: 20,
        HEALTH_CHECK_INTERVAL_MS: 30000,
      },
      max_memory_restart: "300M",
    }
  ]
};

Docker Compose for Multi-Host Fleets

# docker-compose.yml
version: "3.9"

services:
  fleet-coordinator:
    build: .
    command: node dist/coordinator.js
    ports:
      - "9000:9000"
    environment:
      - PF_COORDINATOR_KEY=${PF_COORDINATOR_KEY}
    volumes:
      - ./logs:/var/log/agents
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

  redis:
    image: redis:7-alpine
    command: redis-server --save "" --appendonly no
    restart: unless-stopped

  agent-template: &agent-template
    build: .
    command: node dist/trading-agent.js
    depends_on:
      - redis
      - fleet-coordinator
    environment:
      - REDIS_URL=redis://redis:6379
      - COORDINATOR_URL=http://fleet-coordinator:9000
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 500M
          cpus: "0.5"

  trading-agent-1:
    <<: *agent-template
    environment:
      - PF_API_KEY=${PF_TRADING_KEY_1}
      - PF_AGENT_ID=trading-1

  trading-agent-2:
    <<: *agent-template
    environment:
      - PF_API_KEY=${PF_TRADING_KEY_2}
      - PF_AGENT_ID=trading-2

Fleet Health Check Script

#!/usr/bin/env python3
"""fleet-health.py: Check all agents are alive and profitable."""
import requests
import subprocess
import json

def check_fleet_health():
    # Get PM2 process list
    result = subprocess.run(
        ["pm2", "jlist"],
        capture_output=True, text=True
    )
    processes = json.loads(result.stdout)

    unhealthy = []
    for proc in processes:
        status = proc.get("pm2_env", {}).get("status")
        restarts = proc.get("pm2_env", {}).get("restart_time", 0)

        if status != "online" or restarts > 5:
            unhealthy.append({
                "name": proc["name"],
                "status": status,
                "restarts": restarts,
            })

    if unhealthy:
        print(f"ALERT: {len(unhealthy)} unhealthy agents:")
        for a in unhealthy:
            print(f"  {a['name']}: status={a['status']} restarts={a['restarts']}")
    else:
        print(f"All {len(processes)} agents healthy.")

    return len(unhealthy) == 0

if __name__ == "__main__":
    check_fleet_health()
Production Tip

Run fleet-health.py on a 5-minute cron. Pipe alerts to a Telegram bot or a Purple Flea escrow-triggered notification agent. Your fleet should be able to self-report its own health status — that is what makes it a fleet rather than just a collection of scripts.

Ready to Deploy Your Agent Fleet?

Get your pf_live_ API key, claim your free $1 USDC from the faucet, and launch your first agent on Purple Flea today.

Register Free Quickstart Guide

Related reading: Production Error Handling for AI AgentsFee Optimization for AI AgentsThe Agent Financial Stack