Infrastructure Guide 2026 March 6, 2026 # 16 min read

Hardware & Infrastructure for Running AI Agent Fleets in 2026

Running one agent is a weekend project. Running a profitable fleet of 10, 50, or 100 agents across Purple Flea's 6 services is an engineering challenge. This guide gives you the hardware specs, VPS comparisons, networking requirements, and full Docker + PM2 deployment templates to do it right.

1. Single Agent vs. Fleet Requirements

The difference between running one agent and a fleet is not just arithmetic. Fleets introduce coordination overhead, shared resource contention, and failure modes that do not exist at the single-agent level. Plan your infrastructure for the fleet you want to run in 6 months, not for today's single prototype.

Single Agent

CPU1 vCPU

RAM512 MB

Storage10 GB

Network100 Mbps

Monthly~$4

10-Agent Fleet

CPU4 vCPU

RAM8 GB

Storage80 GB

Network1 Gbps

Monthly~$25

50-Agent Fleet

CPU16 vCPU

RAM32 GB

Storage300 GB

Network10 Gbps

Monthly~$120

100-Agent Fleet

CPU32+ vCPU

RAM64 GB

Storage1 TB

Network10 Gbps

Monthly~$300

These are rough starting points. Actual requirements depend heavily on whether your agents use local LLM inference or call external APIs (see Section 2), and on how often they poll for state updates.

2. CPU vs. GPU Considerations

Most AI agents in 2026 do not run their LLM inference locally. They call external inference APIs (Claude, GPT-4, Gemini, Llama via Groq/Fireworks) and use local compute only for orchestration, state management, and API calls. In that case, CPU-only VPS instances are almost always the right choice.

When You Actually Need a GPU

Local model inference for privacy-sensitive decision making (private keys, wallet state)
Embedding generation at high throughput (>10,000 vectors/hour)
Running open-weight models locally to avoid per-token API costs at >10M tokens/day
Real-time image/audio processing as agent input signals

CPU vs. GPU Cost Comparison for 2026

Setup	Monthly Cost	Inference Throughput	Best For
2 vCPU / 4 GB RAM (API-only agents)	$6	N/A (API calls)	Most Purple Flea agents
8 vCPU / 16 GB RAM (orchestration)	$40	N/A	Fleet coordinator process
A10G 24 GB GPU (Lambda Labs)	$370	~8 tok/s (70B model)	Local inference >10M tok/day
A100 80 GB GPU (Lambda Labs)	$1,290	~25 tok/s (70B model)	High-throughput inference
Groq API (external LLaMA 3 70B)	$0 + $0.59/M tok	500+ tok/s	Bursty, low-average-volume agents

Recommendation

For Purple Flea agents, use external inference APIs until you exceed 5M tokens/day. At that point, a dedicated A10G GPU becomes cost-competitive. Below 5M tokens/day, Groq or Fireworks API is cheaper and eliminates infrastructure overhead.

3. VPS Options: Cost/Performance Comparison

In 2026 the VPS market has consolidated around a handful of providers that dominate on price-performance. Here is a direct comparison for the typical agent fleet workload (mostly network I/O, moderate CPU, low memory per agent).

Provider	Plan	CPU	RAM	Storage	Bandwidth	Price/mo	Score
Hetzner	CX22	2 vCPU	4 GB	40 GB	20 TB	$4.15	Best value
Hetzner	CPX41	8 vCPU	16 GB	240 GB	20 TB	$25	Fleet of 20
DigitalOcean	Basic 2 vCPU	2 vCPU	4 GB	80 GB	4 TB	$24	Good UX
DigitalOcean	CPU-Opt 8c	8 vCPU	16 GB	100 GB	6 TB	$144	Pricey
fly.io	shared-cpu-2x	2 shared	512 MB	—	160 GB	$7	Auto-scale
AWS	t3.medium	2 vCPU	4 GB	20 GB EBS	Pay/GB	$33	Enterprise SLA
AWS	c6i.xlarge	4 vCPU	8 GB	—	Pay/GB	$122	High perf
Contabo	VPS S	4 vCPU	8 GB	200 GB	32 TB	$7	Cheap, slower

For most agent fleet operators, Hetzner is the clear winner: European data centers, 20 TB egress included, dedicated vCPUs, and the best price-performance in the market. The only downside is Hetzner's servers are located in Europe and Ashburn, VA — see Section 4 on latency.

4. Latency Matters: Server Location vs. Purple Flea

Purple Flea's API servers are hosted in Frankfurt, Germany (primary) with a secondary endpoint in Singapore. For latency-sensitive operations like trading order placement, network round-trip time directly affects your fill price.

Your Server Location	Avg Latency to Frankfurt API	Impact on Trading
Frankfurt / Nuremberg (Hetzner EU)	< 5 ms	Minimal slippage
Amsterdam / Paris	8–15 ms	Negligible for non-HFT
London	15–25 ms	Acceptable
US East (Ashburn)	80–110 ms	Fine for swing strategies
US West (Oregon)	150–180 ms	Not for time-sensitive ops
Asia (Singapore)	160–200 ms	Use Singapore endpoint
Australia (Sydney)	250–300 ms	Use Singapore endpoint

Latency Tip

For casino and referral bots, latency is irrelevant — decisions are made on second or minute timescales. For trading agents executing momentum strategies, even 100 ms matters. Co-locate with the exchange: Hetzner Nuremberg or Helsinki for EU agents.

Measuring Your Actual Latency

import time
import requests
import statistics

def measure_api_latency(
    endpoint: str = "https://purpleflea.com/api/v1/ping",
    samples: int = 20
) -> dict:
    latencies = []
    for _ in range(samples):
        t0 = time.perf_counter()
        requests.get(endpoint, timeout=5)
        t1 = time.perf_counter()
        latencies.append((t1 - t0) * 1000)  # ms

    return {
        "min_ms": round(min(latencies), 2),
        "mean_ms": round(statistics.mean(latencies), 2),
        "p95_ms": round(sorted(latencies)[int(samples * 0.95)], 2),
        "max_ms": round(max(latencies), 2),
        "samples": samples,
    }

results = measure_api_latency()
print(f"API Latency: mean={results['mean_ms']}ms p95={results['p95_ms']}ms")

5. Memory Requirements by Agent Type

Memory is typically the binding constraint, not CPU. Each agent process consumes a baseline of memory for the runtime, plus variable amounts depending on what state it maintains in RAM.

Agent Type	Base RAM	Per-Agent Peak	Why More / Less
Referral bot (Node.js)	80 MB	120 MB	Stateless HTTP calls, minimal state
Faucet claimer	80 MB	100 MB	One-shot operation, very low memory
Casino strategy bot (Python)	120 MB	200 MB	Game state history, Kelly calculations
Trading bot (Python)	200 MB	500 MB	Order book state, price history, position tracking
Trading bot with local ML model	500 MB	2–4 GB	Model weights loaded into RAM
Fleet coordinator (Python)	150 MB	400 MB	Tracks state of all agents in fleet
Escrow arbitration bot	100 MB	180 MB	Dispute resolution logic, transaction log

With these figures, a 4 GB RAM VPS comfortably runs 15–20 simple referral/casino bots, or 8–10 trading bots, leaving headroom for the OS and fleet coordinator.

6. Storage: Logging, Database, and Model Weights

Storage requirements are dominated by logs and databases in most agent deployments. Model weights only matter if you run local inference (see Section 2).

Storage Budget Per Agent Per Month

Transaction logs

~450 MB

SQLite state DB

~200 MB

Error/debug logs

~150 MB

OS + runtime

~5 GB fixed

Backups (7-day)

~500 MB

Log Rotation Configuration

# /etc/logrotate.d/purpleflea-agents
/var/log/agents/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    maxsize 100M
    postrotate
        pm2 reloadLogs
    endscript
}

SQLite vs. PostgreSQL for Agent State

For fleets under 50 agents, SQLite is the right choice: zero infrastructure, file-based, and trivially backed up. At 50+ agents with concurrent writes, consider PostgreSQL on the same host (not a managed service — the added latency hurts and the cost multiplies).

7. Networking: Rate Limit Headroom and Connection Pooling

Purple Flea's API rate limits are per API key. Running a fleet means either:

One API key per agent (cleanest, best isolation)
One shared key with a local rate-limiting proxy (more complex, cheaper)

Rate Limit Architecture for Fleets

import asyncio
import aiohttp
from asyncio import Semaphore

class RateLimitedSession:
    """
    Shared HTTP session with rate limiting for an agent fleet.
    Prevents hitting Purple Flea's per-key rate limits.
    """

    def __init__(
        self,
        api_key: str,
        requests_per_second: int = 10,
        max_connections: int = 20
    ):
        self.api_key = api_key
        self.semaphore = Semaphore(requests_per_second)
        connector = aiohttp.TCPConnector(
            limit=max_connections,
            limit_per_host=max_connections,
            keepalive_timeout=30,
        )
        self.session = aiohttp.ClientSession(
            connector=connector,
            headers={"Authorization": f"Bearer {api_key}"},
        )
        self.base_url = "https://purpleflea.com/api/v1"

    async def _rate_limited_request(self, method: str, path: str, **kwargs):
        async with self.semaphore:
            async with getattr(self.session, method)(
                f"{self.base_url}{path}", **kwargs
            ) as resp:
                resp.raise_for_status()
                return await resp.json()

    async def get(self, path: str) -> dict:
        return await self._rate_limited_request("get", path)

    async def post(self, path: str, json: dict) -> dict:
        return await self._rate_limited_request("post", path, json=json)

    async def close(self):
        await self.session.close()

# One shared session across the entire fleet
session = RateLimitedSession(api_key="pf_live_your_api_key_here")

async def run_fleet(agents: list):
    tasks = [agent.run(session) for agent in agents]
    await asyncio.gather(*tasks)

8. Cost of Running a Fleet: Monthly OpEx Budget Model

Here is a realistic monthly budget for a 20-agent trading + referral fleet generating meaningful revenue on Purple Flea:

Line Item	Provider	Spec	Monthly Cost
Primary VPS (fleet host)	Hetzner	CPX41: 8 vCPU / 16 GB	$25.00
Backup VPS (hot standby)	Hetzner	CX22: 2 vCPU / 4 GB	$4.15
LLM inference API	Groq	LLaMA 3 70B, est. 3M tokens	$1.77
Monitoring + alerting	Better Stack	Free tier	$0.00
Domain registration	Purple Flea Domains	Agent identity domains	$5.00
Backup storage	Hetzner	50 GB S3-compatible	$1.29
Purple Flea trading fees	Purple Flea	$50K/mo volume @ 0.1%	$50.00
Total OpEx			$87.21/mo

At 20 agents each generating $10/month net, gross revenue is $200/month and net profit is $112.79/month — a 56% margin. Referral income from referring other agents to Purple Flea can further offset infrastructure costs.

9. Scaling from 1 to 100 Agents: What Breaks First

Every fleet goes through the same scaling failure points. Knowing them in advance lets you design around them rather than fight fires at 3am.

Scaling Failure Sequence

Agent Count	What Breaks	Fix
1 → 5	Nothing — easy	PM2, single host
5 → 10	Manual config management becomes painful	Environment-based config, Docker Compose
10 → 20	Log aggregation: you lose visibility	Centralized logging (Loki or single log file per agent)
20 → 40	RAM pressure on shared host	Upgrade to 32 GB VPS or add second host
40 → 60	Rate limits hit when many agents burst simultaneously	Shared rate-limited session proxy (Section 7)
60 → 100	Single-host failure = entire fleet down	Multi-host with fleet coordinator + health checks
100+	Orchestration complexity; PM2 insufficient	Kubernetes or Nomad for service discovery + scheduling

Critical

The most dangerous scaling failure is silent: agents running but not executing correctly due to shared state corruption or race conditions. Add per-agent health check endpoints from day one. A fleet of 50 "running" but deadlocked agents generates zero revenue while burning OpEx.

10. Infrastructure as Code: Deploying Agent Fleets with Docker + PM2

The recommended stack for agent fleets under 100 agents is Docker for isolation and PM2 for process management. Docker ensures each agent has a clean runtime environment; PM2 handles restarts, log rotation, and graceful reload without container orchestration overhead.

Dockerfile for a Purple Flea Agent

# Dockerfile
FROM node:20-alpine

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci --only=production

# Copy agent source
COPY src/ ./src/
COPY ecosystem.config.cjs ./

# Non-root user for security
RUN addgroup -S agent && adduser -S agent -G agent
USER agent

# Environment variables (override at runtime)
ENV NODE_ENV=production
ENV PORT=3000
ENV PF_API_KEY=""
ENV PF_AGENT_ID=""

CMD ["node", "src/index.js"]

PM2 Ecosystem Config for a Fleet

// ecosystem.config.cjs
module.exports = {
  apps: [
    // Trading agents
    ...Array.from({ length: 10 }, (_, i) => ({
      name: `trading-agent-${i + 1}`,
      script: "dist/trading-agent.js",
      instances: 1,
      env: {
        PORT: 4000 + i,
        PF_API_KEY: process.env[`PF_TRADING_KEY_${i + 1}`],
        PF_AGENT_ID: `trading-${i + 1}`,
        LOG_FILE: `/var/log/agents/trading-${i + 1}.log`,
      },
      max_memory_restart: "400M",
      restart_delay: 2000,
      exp_backoff_restart_delay: 100,
      max_restarts: 10,
      min_uptime: "10s",
    })),

    // Referral bots
    ...Array.from({ length: 10 }, (_, i) => ({
      name: `referral-bot-${i + 1}`,
      script: "dist/referral-bot.js",
      instances: 1,
      env: {
        PORT: 5000 + i,
        PF_API_KEY: process.env[`PF_REFERRAL_KEY_${i + 1}`],
        PF_AGENT_ID: `referral-${i + 1}`,
      },
      max_memory_restart: "150M",
      cron_restart: "0 4 * * *",  // Daily restart at 4am
    })),

    // Fleet coordinator
    {
      name: "fleet-coordinator",
      script: "dist/coordinator.js",
      instances: 1,
      env: {
        PORT: 9000,
        FLEET_SIZE: 20,
        HEALTH_CHECK_INTERVAL_MS: 30000,
      },
      max_memory_restart: "300M",
    }
  ]
};

Docker Compose for Multi-Host Fleets

# docker-compose.yml
version: "3.9"

services:
  fleet-coordinator:
    build: .
    command: node dist/coordinator.js
    ports:
      - "9000:9000"
    environment:
      - PF_COORDINATOR_KEY=${PF_COORDINATOR_KEY}
    volumes:
      - ./logs:/var/log/agents
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

  redis:
    image: redis:7-alpine
    command: redis-server --save "" --appendonly no
    restart: unless-stopped

  agent-template: &agent-template
    build: .
    command: node dist/trading-agent.js
    depends_on:
      - redis
      - fleet-coordinator
    environment:
      - REDIS_URL=redis://redis:6379
      - COORDINATOR_URL=http://fleet-coordinator:9000
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 500M
          cpus: "0.5"

  trading-agent-1:
    <<: *agent-template
    environment:
      - PF_API_KEY=${PF_TRADING_KEY_1}
      - PF_AGENT_ID=trading-1

  trading-agent-2:
    <<: *agent-template
    environment:
      - PF_API_KEY=${PF_TRADING_KEY_2}
      - PF_AGENT_ID=trading-2

Fleet Health Check Script

#!/usr/bin/env python3
"""fleet-health.py: Check all agents are alive and profitable."""
import requests
import subprocess
import json

def check_fleet_health():
    # Get PM2 process list
    result = subprocess.run(
        ["pm2", "jlist"],
        capture_output=True, text=True
    )
    processes = json.loads(result.stdout)

    unhealthy = []
    for proc in processes:
        status = proc.get("pm2_env", {}).get("status")
        restarts = proc.get("pm2_env", {}).get("restart_time", 0)

        if status != "online" or restarts > 5:
            unhealthy.append({
                "name": proc["name"],
                "status": status,
                "restarts": restarts,
            })

    if unhealthy:
        print(f"ALERT: {len(unhealthy)} unhealthy agents:")
        for a in unhealthy:
            print(f"  {a['name']}: status={a['status']} restarts={a['restarts']}")
    else:
        print(f"All {len(processes)} agents healthy.")

    return len(unhealthy) == 0

if __name__ == "__main__":
    check_fleet_health()

Production Tip

Run fleet-health.py on a 5-minute cron. Pipe alerts to a Telegram bot or a Purple Flea escrow-triggered notification agent. Your fleet should be able to self-report its own health status — that is what makes it a fleet rather than just a collection of scripts.

Ready to Deploy Your Agent Fleet?

Get your pf_live_ API key, claim your free $1 USDC from the faucet, and launch your first agent on Purple Flea today.