Upgrading a software service is a well-understood engineering problem. Upgrading an AI agent that is actively managing money — open positions, pending escrows, live casino sessions, domain registrations — is a different problem entirely. The cost of downtime isn't measured in user experience; it's measured in missed opportunities, stale state, and in the worst case, double-spending or abandoned funds.
This post covers the deployment patterns that production agent teams have converged on: blue-green, canary, shadow mode, state migration, and rollback. Each pattern trades complexity for risk reduction differently. Choose based on your agent's risk profile.
Three risks compound during agent upgrades: (1) Downtime risk — funds stagnate or miss time-sensitive operations. (2) Bug risk — new version has a defect that loses money. (3) State risk — position or balance state is corrupted during transition. Good upgrade patterns minimize all three simultaneously.
1. The Upgrade Problem
A typical software deployment has one constraint: don't serve errors during the transition. An agent deployment managing financial infrastructure has additional constraints that most DevOps literature ignores entirely:
- Idempotency: if the agent crashes mid-operation, restarting must not re-execute a transaction that already executed.
- State continuity: the new version must have access to all open positions, pending decisions, and cached state from the old version.
- API key handoff: Purple Flea API keys, casino session state, and escrow IDs must transfer cleanly — not be regenerated.
- Decision continuity: if the old version had decided to exit a position at 2x, the new version should honor that decision or explicitly override it with a logged reason.
The solution is never to "restart" — it's to transition. The patterns below are all variations on the same theme: keep v1 running in a safe state while v2 proves itself, then transfer control gracefully.
2. Blue-Green Deployment
Blue-green runs two identical environments — blue (current) and green (new). At any given time, only one is active (receiving real decisions and executing transactions). The other is warm and ready. Switching from blue to green is instantaneous: a single environment variable or config flag change directs the agent's decision loop to the green instance.
import os, redis class AgentController: def __init__(self, version: str): self.version = version self.r = redis.Redis(host="localhost") self.active_key = "agent:active_version" def is_active(self) -> bool: active = self.r.get(self.active_key) return active is not None and active.decode() == self.version def run_loop(self): while True: if not self.is_active(): # Standby: read state, do not execute self.sync_state() time.sleep(1) continue # Active: read state AND execute decisions state = self.read_state() decisions = self.decide(state) self.execute(decisions) self.write_state(state) # Switch from blue to green (run on operator machine) def switch_to_green(): r = redis.Redis(host="localhost") r.set("agent:active_version", "green") print("Switched active to green")
Rollback is instant — just set ACTIVE=blue again. No restart needed. Both versions are warm so the switch is millisecond-level. The main cost: you're running two instances, using 2x compute. Acceptable for any financial agent where the cost of a bug exceeds the cost of extra compute.
3. Canary Releases: 5% of Capital First
Canary releases split traffic (or in agent terms, capital allocation) between old and new versions. Rather than switching 100% of activity to v2, you route a small fraction — say, 5% of your betting budget or escrow volume — to v2 while v1 handles the rest. If v2 performs as expected for N rounds without errors, you incrementally increase its allocation.
import random class CanaryRouter: def __init__(self, v2_fraction: float = 0.05): self.v2_fraction = v2_fraction # 0.0 to 1.0 self.v1_metrics = [] self.v2_metrics = [] def route(self, bet_amount: float) -> tuple[str, float]: # Returns (version, adjusted_bet) if random.random() < self.v2_fraction: return "v2", bet_amount * self.v2_fraction return "v1", bet_amount * (1 - self.v2_fraction) def record(self, version: str, outcome: float): (self.v2_metrics if version == "v2" else self.v1_metrics).append(outcome) def should_promote(self, min_samples: int = 100, tolerance: float = 0.05) -> bool: if len(self.v2_metrics) < min_samples: return False v1_mean = sum(self.v1_metrics) / len(self.v1_metrics) v2_mean = sum(self.v2_metrics) / len(self.v2_metrics) # Promote if v2 within tolerance of v1 or better return v2_mean >= v1_mean * (1 - tolerance) router = CanaryRouter(v2_fraction=0.05) version, bet = router.route(bet_amount=10.0) print(f"Route to {version}, bet ${bet:.2f}")
4. Shadow Mode: Watch Before You Act
Shadow mode is the safest testing pattern: the new agent version runs alongside the old, receives the same inputs, computes decisions — but never executes them. All v2 actions are logged as "shadow actions." You can compare what v2 would have done to what v1 actually did, without any real-money risk.
class ShadowAgent: def __init__(self, live_agent, shadow_agent): self.live = live_agent self.shadow = shadow_agent self.divergences = [] def decide_and_execute(self, market_state: dict): # Live agent: decide and execute live_decision = self.live.decide(market_state) self.live.execute(live_decision) # Shadow agent: decide only — NO execute shadow_decision = self.shadow.decide(market_state) # Log if shadow disagrees with live if shadow_decision != live_decision: self.divergences.append({ "state": market_state, "live": live_decision, "shadow": shadow_decision, "timestamp": time.time(), }) print(f"Divergence: live={live_decision} shadow={shadow_decision}") return live_decision def shadow_win_rate(self) -> str: total = len(self.divergences) return f"Shadow diverged {total} times"
Shadow mode is particularly valuable for strategy changes: if v2 uses a different crash cash-out target or a different dice range selection, you can observe how that would have performed over hundreds of real rounds before committing capital.
5. State Migration: Transferring Position and Balance State
Agent state is more complex than database rows. It includes: open casino positions, pending escrow IDs being monitored, cached API responses, rate-limit counters, decision context windows, and any ML model state. Every piece must transfer cleanly or the new agent starts blind — making decisions without context that v1 had accumulated.
import json, time from pathlib import Path class AgentState: def snapshot(self) -> dict: return { "version": "1.0", "timestamp": time.time(), "escrow_ids": self.active_escrows, "open_positions": self.positions, "last_action_time": self.last_action, "rate_limit_tokens": self.rate_tokens, "decision_log": self.recent_decisions[-50:], # last 50 } def save(self, path: str = "/tmp/agent_state.json"): snap = self.snapshot() Path(path).write_text(json.dumps(snap, indent=2)) print(f"State saved: {len(snap['escrow_ids'])} escrows, {len(snap['open_positions'])} positions") @classmethod def restore(cls, path: str = "/tmp/agent_state.json"): data = json.loads(Path(path).read_text()) age = time.time() - data["timestamp"] if age > 300: # stale after 5 minutes raise ValueError(f"State is {age:.0f}s old — too stale to restore safely") state = cls() state.active_escrows = data["escrow_ids"] state.positions = data["open_positions"] state.recent_decisions = data["decision_log"] return state
6. Rollback Strategy: When and How to Roll Back
Rollback is not failure — it's the correct response to a bug discovered in production. The critical question is not "how to rollback" (that's the easy part in blue-green) but "when to rollback." Clear automatic triggers prevent loss from hesitation.
| Trigger Condition | Severity | Action | Timeframe |
|---|---|---|---|
| Any unhandled exception in execution path | Critical | Immediate rollback | <1s |
| API auth failure (bad key) | Critical | Immediate rollback | <1s |
| Double-spend detected in decision log | Critical | Immediate rollback + alert | <1s |
| PnL deviation >3 std devs from v1 baseline | High | Rollback after 10 rounds | <60s |
| Decision latency >2x v1 median | Medium | Alert, rollback if persists | <300s |
| Escrow monitoring gap detected | Medium | Rollback, audit escrows | <60s |
| PnL deviation <1 std dev from v1 | Low | Continue, log | — |
class RollbackGuard: def __init__(self, baseline_pnl_per_round: float, std_dev: float): self.baseline = baseline_pnl_per_round self.std = std_dev self.rounds = [] self.rollback_triggered = False def check(self, round_pnl: float) -> bool: self.rounds.append(round_pnl) if len(self.rounds) < 10: return False # need minimum samples recent_mean = sum(self.rounds[-10:]) / 10 deviation = abs(recent_mean - self.baseline) / self.std if deviation > 3.0: self.rollback_triggered = True print(f"ROLLBACK TRIGGERED: {deviation:.1f} std devs from baseline") return True return False def execute_rollback(self): # In blue-green: just switch active back to v1 r = redis.Redis(host="localhost") r.set("agent:active_version", "blue") print("Rolled back to blue (v1)")
7. Feature Flags: Enable New Strategies Without Redeployment
Feature flags let you enable or disable specific behaviors in a running agent without restarting or deploying a new version. For agents, flags commonly control: which game to play, what bet sizing formula to use, whether to accept new escrow requests, and which trading signals to act on.
Flags stored in Redis or a simple key-value store can be toggled by an operator in real time. The agent checks flags on each decision cycle. This decouples deployment (new binary) from activation (new behavior).
DEFAULTS = {
"game": "coinflip",
"bet_sizing": "kelly_floor",
"crash_target": 2.0,
"accept_escrows": True,
"max_session_bets": 50,
"stop_loss_pct": 0.20,
}
class Flags:
def __init__(self):
self.r = redis.Redis(host="localhost")
def get(self, key: str):
val = self.r.get(f"flag:{key}")
if val is None:
return DEFAULTS.get(key)
# Deserialize (bool, float, str)
decoded = val.decode()
if decoded in ("True", "False"): return decoded == "True"
try: return float(decoded)
except ValueError: return decoded
def set(self, key: str, value):
self.r.set(f"flag:{key}", str(value))
# Usage in decision loop
flags = Flags()
game = flags.get("game") # "coinflip" or "dice" or "crash"
target = flags.get("crash_target") # 2.0 default
8. A/B Testing Agent Strategies in Production
A/B testing applies the canary pattern to strategy rather than version. Run two strategy variants simultaneously, split capital evenly, and measure which performs better over a statistically significant number of rounds. Unlike canary releases (which compare v1 vs v2), A/B testing compares two hypotheses within the same version.
import statistics class ABTest: def __init__(self, strategy_a: dict, strategy_b: dict): self.a = strategy_a self.b = strategy_b self.results_a = [] self.results_b = [] def assign(self) -> str: return "a" if random.random() < 0.5 else "b" def record(self, variant: str, pnl: float): (self.results_a if variant == "a" else self.results_b).append(pnl) def summary(self): if not self.results_a or not self.results_b: return "insufficient data" mean_a = statistics.mean(self.results_a) mean_b = statistics.mean(self.results_b) winner = "A" if mean_a > mean_b else "B" return f"A: {mean_a:+.4f} | B: {mean_b:+.4f} | Winner: {winner}"
9. Versioning Your Agent's Decision Logic
Every significant change to an agent's decision logic should be versioned and logged. This isn't just for rollback — it's for auditability. When you need to explain why the agent made a specific bet or release an escrow at a specific time, you need to know exactly which version of the decision logic was running.
AGENT_VERSION = "2.4.1" DECISION_LOG_PATH = "/var/log/agent/decisions.jsonl" def log_decision( decision_type: str, inputs: dict, output: dict, reasoning: str = "", ): entry = { "agent_version": AGENT_VERSION, "timestamp": time.time(), "decision_type": decision_type, "inputs": inputs, "output": output, "reasoning": reasoning, } with open(DECISION_LOG_PATH, "a") as f: f.write(json.dumps(entry) + "\n") # Usage log_decision( decision_type="crash_cashout", inputs={"current_multiplier": 1.87, "target": 2.0, "bet": 5.0}, output={"action": "hold"}, reasoning="multiplier below target, EV still positive to hold", )
10. Deployment Checklist for Agent Upgrades
Before deploying any new agent version to production, verify every item on this checklist. The checklist is opinionated toward Purple Flea infrastructure but applies to any agent managing financial state.
Pre-deployment
- State snapshot saved — current agent state exported and verified parseable
- v2 tested in shadow mode — minimum 50 rounds of shadow comparison to v1
- API key validity confirmed — verify Purple Flea API key returns 200 from /me endpoint
- Escrow audit complete — all active escrow IDs listed, expected timeouts noted
- No open crash games — wait for any live crash sessions to conclude
- Rate limit counters noted — ensure v2 starts with current window state
- Rollback trigger thresholds configured — RollbackGuard initialized with v1 baseline metrics
During deployment (blue-green)
- v2 launched in standby — not active, syncing state from shared store
- v2 health check passes — /health endpoint returns 200, all dependencies connected
- State restored to v2 — snapshot from pre-deployment loaded and verified
- ACTIVE flag switched — Redis key set to green/v2
- First 10 decisions logged — manually verify decisions look sane
- Canary metrics watching — RollbackGuard collecting round outcomes
Post-deployment (30 minutes)
- No rollback triggered — RollbackGuard stable, no 3+ std-dev deviations
- All active escrows still monitored — verify v2 is watching all pre-migration escrow IDs
- PnL trajectory matches expectation — within 1 std dev of v1 baseline
- Decision log being written — DECISION_LOG_PATH accumulating entries
- v1 kept warm for 24 hours — do not decommission until 24-hour stability confirmed
Never deploy a new agent version during a high-volatility market event, an ongoing escrow dispute, or within 30 minutes of a scheduled auto-release. Timing upgrades during quiet periods eliminates the most common class of state migration failures.
Summary
Deploying AI agents that manage money safely requires treating upgrades as state transitions, not restarts:
- Blue-green: best for controlled, instant cutover with immediate rollback capability.
- Canary: best for risk-averse promotion of new strategies over days.
- Shadow mode: mandatory for any strategy change before capital exposure.
- State migration: always save and restore — never assume new instance has old context.
- Rollback triggers: define thresholds before deployment, not after a problem.
- Feature flags: decouple deployment from activation — ship code, turn on behavior separately.
- Decision logging: version every decision for auditability and rollback reconstruction.
Purple Flea's faucet gives new agents $1 USDC to test with — perfect for validating upgrade patterns before deploying agents with real capital. Start at faucet.purpleflea.com.