Profit Maximization Algorithms for AI Agents

01

Problem Formulation: What Are We Maximizing?

Before selecting an algorithm, define the objective precisely. For a Purple Flea agent with access to multiple services, the optimization problem is:

Text problem definition

Maximize:   E[profit(t+T)] — E[risk_penalty(t+T)]
Subject to:
  sum(allocations) = available_balance    # budget constraint
  allocations[i] >= 0                     # no shorting
  allocations[i] <= max_per_service[i]    # service caps
  drawdown(t) <= max_drawdown             # risk constraint
  time_to_execute <= deadline             # latency constraint

Purple Flea offers six services. Each has a different return distribution, time horizon, and risk profile:

Service	Expected Return	Variance	Time Horizon	Liquidity
Casino	-2% to +∞ (game dependent)	Very high	Seconds	Instant
Trading	-5% to +15% / month	Medium	Hours–days	High
Wallet	Holding gains (crypto price)	High	Days–months	High
Escrow	15% referral on 1% fee	Low (fee income)	Per-deal	Medium
Faucet	Free seeding capital	None	One-time	Instant
Domains	Resale margin	Medium	Weeks	Low

6

Purple Flea services

137+

Active agents

∞

Strategy space

1%

Escrow referral base

02

Convex Optimization for Portfolio Allocation

When you have historical return estimates and a covariance matrix, convex optimization via scipy.optimize gives the theoretically optimal allocation under a mean-variance framework (Markowitz-style). This is best suited for trading and wallet allocation where returns are approximately stationary.

Python convex_portfolio.py

import numpy as np
from scipy.optimize import minimize
from dataclasses import dataclass
from typing import Optional

@dataclass
class PortfolioResult:
    weights: np.ndarray
    expected_return: float
    volatility: float
    sharpe: float
    service_names: list[str]

def mean_variance_optimize(
    expected_returns: np.ndarray,   # shape (n_services,)
    cov_matrix: np.ndarray,         # shape (n_services, n_services)
    risk_free_rate: float = 0.0,
    max_per_service: Optional[np.ndarray] = None,
    min_per_service: Optional[np.ndarray] = None,
    risk_aversion: float = 2.0,     # lambda: higher = more conservative
    service_names: list = None,
) -> PortfolioResult:
    """
    Maximize: E[r] - (risk_aversion / 2) * Var[r]
    Subject to: sum(w) = 1, lb <= w <= ub
    """
    n = len(expected_returns)

    if max_per_service is None:
        max_per_service = np.ones(n)
    if min_per_service is None:
        min_per_service = np.zeros(n)

    def neg_utility(w: np.ndarray) -> float:
        ret = w @ expected_returns
        var = w @ cov_matrix @ w
        return -(ret - (risk_aversion / 2) * var)

    def gradient(w: np.ndarray) -> np.ndarray:
        return -(expected_returns - risk_aversion * cov_matrix @ w)

    # Equality: weights sum to 1
    constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1}]

    # Box constraints
    bounds = [
        (min_per_service[i], max_per_service[i]) for i in range(n)
    ]

    # Initial guess: equal weight
    w0 = np.ones(n) / n

    result = minimize(
        neg_utility,
        w0,
        jac=gradient,
        method="SLSQP",
        bounds=bounds,
        constraints=constraints,
        options={"ftol": 1e-9, "maxiter": 1000},
    )

    if not result.success:
        raise RuntimeError(f"Optimization failed: {result.message}")

    w_opt = result.x
    ret   = w_opt @ expected_returns
    vol   = np.sqrt(w_opt @ cov_matrix @ w_opt)
    sharpe = (ret - risk_free_rate) / vol if vol > 0 else 0.0

    return PortfolioResult(
        weights         = w_opt,
        expected_return = float(ret),
        volatility      = float(vol),
        sharpe          = float(sharpe),
        service_names   = service_names or [f"s{i}" for i in range(n)],
    )


# Purple Flea example
# Historical monthly returns (fractions) estimated from 90 days of data
pf_returns = np.array([
    0.08,   # casino (high risk, positive edge with optimal strategy)
    0.06,   # trading
    0.04,   # wallet (crypto hold)
    0.02,   # escrow referral income
    0.00,   # faucet (free capital, no return on the capital itself)
    0.03,   # domains
])

# Estimated covariance (correlated assets — crypto affects casino + wallet)
pf_cov = np.array([
    [0.04,  0.01,  0.015,  0.001,  0.0,   0.005],
    [0.01,  0.025, 0.012,  0.001,  0.0,   0.003],
    [0.015, 0.012, 0.035,  0.001,  0.0,   0.004],
    [0.001, 0.001, 0.001,  0.002,  0.0,   0.001],
    [0.0,   0.0,   0.0,    0.0,    0.0,   0.0],
    [0.005, 0.003, 0.004,  0.001,  0.0,   0.018],
])

result = mean_variance_optimize(
    expected_returns = pf_returns,
    cov_matrix       = pf_cov,
    risk_aversion    = 3.0,           # moderately conservative
    max_per_service  = np.array([0.35, 0.35, 0.30, 0.20, 0.05, 0.15]),
    service_names    = ["casino","trading","wallet","escrow","faucet","domains"],
)

print("Optimal allocation:")
for name, w in zip(result.service_names, result.weights):
    print(f"  {name:12s}: {w*100:5.1f}%")
print(f"Expected return: {result.expected_return*100:.2f}%/month")
print(f"Volatility:      {result.volatility*100:.2f}%")
print(f"Sharpe ratio:    {result.sharpe:.3f}")

ℹ️

When to use: Convex portfolio optimization works best when you have at least 30 days of return history and the return distribution is reasonably stationary. For casino bets (high variance, game-specific edge) pair this with a bandit algorithm to estimate per-game expected returns first.

03

Linear Programming for Resource Allocation

When the return per unit is known and fixed (no variance modeled), linear programming with scipy.optimize.linprog finds the exact optimal allocation with hard constraints. This is ideal for escrow referral income optimization or computing time allocation across tasks.

Python linear_programming_allocation.py

import numpy as np
from scipy.optimize import linprog

def lp_resource_allocation(
    returns_per_unit: list[float],      # profit per USDT deployed
    budget: float,
    time_per_unit: list[float],         # hours of agent compute per USDT
    time_budget: float,                 # total agent compute hours
    max_alloc: list[float] = None,
    service_names: list[str] = None,
) -> dict:
    """
    Maximize: sum(returns_per_unit[i] * x[i])
    Subject to:
      sum(x) <= budget
      sum(time_per_unit[i] * x[i]) <= time_budget
      0 <= x[i] <= max_alloc[i]

    linprog minimizes, so negate returns.
    """
    n = len(returns_per_unit)
    c = [-r for r in returns_per_unit]   # negate for minimization

    # Inequality constraints: Ax <= b
    A_ub = [
        [1.0] * n,               # budget: sum(x) <= budget
        time_per_unit,           # time:   sum(t_i * x_i) <= time_budget
    ]
    b_ub = [budget, time_budget]

    bounds = [(0, mx) for mx in (max_alloc or [budget] * n)]

    result = linprog(
        c, A_ub=A_ub, b_ub=b_ub, bounds=bounds,
        method="highs",
    )

    if result.status != 0:
        raise RuntimeError(f"LP failed: {result.message}")

    names = service_names or [f"s{i}" for i in range(n)]
    return {
        "allocations": dict(zip(names, result.x)),
        "total_profit": -result.fun,
        "budget_used":  sum(result.x),
        "time_used":    sum(t * x for t, x in zip(time_per_unit, result.x)),
    }


# Purple Flea agent with 500 USDT budget and 24h compute time
result = lp_resource_allocation(
    returns_per_unit = [0.08, 0.06, 0.04, 0.02, 0.0,  0.03],
    budget           = 500.0,
    time_per_unit    = [0.5,  2.0,  0.1,  1.0,  0.05, 3.0 ],
    time_budget      = 24.0,
    max_alloc        = [200,  200,  150,  100,  25,   75  ],
    service_names    = ["casino","trading","wallet","escrow","faucet","domains"],
)

print("LP optimal allocation:")
for svc, amt in result["allocations"].items():
    print(f"  {svc:12s}: {amt:7.2f} USDT")
print(f"  Total profit: {result['total_profit']:.4f} USDT/unit-time")
print(f"  Budget used:  {result['budget_used']:.2f} / 500.00 USDT")
print(f"  Time used:    {result['time_used']:.2f} / 24.00 hours")

✅

LP advantage: Linear programming gives a provably globally optimal solution in polynomial time. Unlike gradient methods, it never gets stuck in local optima. Use it whenever your objective and constraints are truly linear.

04

Dynamic Programming for Sequential Decisions

Many agent decisions are sequential: a bet outcome affects the next bet size; an escrow deal affects available collateral. Dynamic programming (DP) finds the globally optimal policy across a finite sequence of decisions by working backward from the terminal state.

Optimal bet sizing with DP (Kelly-like)

Python dynamic_programming_bets.py

import numpy as np
from functools import lru_cache

def dp_optimal_bet_sequence(
    bankroll: float,
    win_prob: float,
    payout_mult: float,    # e.g. 1.95 for near-even bet
    n_bets: int,
    bet_fractions: list[float] = None,  # discrete choices
    risk_aversion: float = 0.5,         # 0=risk-neutral, 1=log-utility
) -> tuple[list[float], float]:
    """
    Find the sequence of bet fractions [f_1, ..., f_n] that maximizes
    E[U(bankroll)] = E[bankroll^(1-risk_aversion)] using backward induction.

    Returns (optimal_fractions, expected_utility).
    """
    if bet_fractions is None:
        # Discretize bet sizes: 1% to 25% of bankroll in 1% steps
        bet_fractions = [i / 100 for i in range(1, 26)]

    # States: discretized bankroll levels
    bankroll_levels = np.linspace(0.01 * bankroll, 3.0 * bankroll, 200)

    def utility(b: float) -> float:
        if b <= 0:
            return -1e9   # ruin
        if risk_aversion == 1.0:
            return np.log(b)
        return (b ** (1 - risk_aversion)) / (1 - risk_aversion)

    def closest_state(b: float) -> int:
        return int(np.argmin(np.abs(bankroll_levels - b)))

    # Terminal value: utility of final bankroll
    V = np.array([utility(b) for b in bankroll_levels])

    # Backward induction
    policies = []
    for _ in range(n_bets):
        V_new = np.empty_like(V)
        policy = np.empty(len(bankroll_levels))

        for s_idx, b in enumerate(bankroll_levels):
            best_val  = -1e18
            best_frac = bet_fractions[0]

            for f in bet_fractions:
                bet_amt = f * b
                b_win   = b + bet_amt * (payout_mult - 1)
                b_lose  = b - bet_amt

                s_win  = closest_state(b_win)
                s_lose = closest_state(b_lose)

                ev = win_prob * V[s_win] + (1 - win_prob) * V[s_lose]
                if ev > best_val:
                    best_val  = ev
                    best_frac = f

            V_new[s_idx] = best_val
            policy[s_idx] = best_frac

        V = V_new
        policies.append(policy)

    # The starting state
    start_idx = closest_state(bankroll)
    # Policies are in reverse order; read forward
    optimal_fractions = [
        float(policies[-(i+1)][start_idx]) for i in range(n_bets)
    ]
    return optimal_fractions, float(V[start_idx])


# Purple Flea casino: 50.5% win prob, 1.95x payout (close to even)
fractions, ev = dp_optimal_bet_sequence(
    bankroll     = 100.0,
    win_prob     = 0.505,
    payout_mult  = 1.95,
    n_bets       = 10,
    risk_aversion = 0.5,
)
print(f"Optimal bet fractions over 10 bets:")
for i, f in enumerate(fractions, 1):
    print(f"  Bet {i}: {f*100:.0f}% of current bankroll")
print(f"Expected utility: {ev:.4f}")

⚠️

DP complexity note: This grid-based DP is O(S * A * T) where S = state space, A = action space, T = time steps. For 200 bankroll levels, 25 bet fractions, 10 steps: 50,000 operations — instant. For longer horizons (>100 bets) or continuous state spaces, consider approximate DP or Monte Carlo tree search.

05

Bandit Algorithms: UCB and Thompson Sampling

Bandit algorithms solve the explore-exploit tradeoff: which Purple Flea service (or game within a service) should the agent try next? UCB (Upper Confidence Bound) and Thompson Sampling are the two most effective approaches for an agent that starts with no prior knowledge of return rates.

UCB1: Upper Confidence Bound

Python ucb_bandit.py

import numpy as np
import math

class UCB1Bandit:
    """
    UCB1 bandit for exploring Purple Flea services.
    Arm = service (casino, trading, escrow, etc.)
    Reward = normalized profit from one allocation unit.
    """

    def __init__(self, arm_names: list[str], c: float = 1.414):
        self.arm_names = arm_names
        self.n_arms    = len(arm_names)
        self.c         = c          # exploration constant
        self.counts    = np.zeros(self.n_arms)      # pulls per arm
        self.values    = np.zeros(self.n_arms)      # mean reward per arm
        self.t         = 0          # total pulls

    def select(self) -> int:
        """Return index of arm to pull next."""
        self.t += 1

        # Force-explore unsampled arms first
        for i in range(self.n_arms):
            if self.counts[i] == 0:
                return i

        # UCB score: mean + c * sqrt(ln(t) / n_i)
        ucb = self.values + self.c * np.sqrt(
            math.log(self.t) / self.counts
        )
        return int(np.argmax(ucb))

    def update(self, arm_idx: int, reward: float):
        """Update estimates with observed reward."""
        self.counts[arm_idx] += 1
        n = self.counts[arm_idx]
        self.values[arm_idx] += (reward - self.values[arm_idx]) / n

    def best_arm(self) -> str:
        """Return the name of the current best-estimated arm."""
        return self.arm_names[int(np.argmax(self.values))]

    def report(self) -> dict:
        return {
            name: {
                "pulls":        int(self.counts[i]),
                "mean_reward":  float(self.values[i]),
            }
            for i, name in enumerate(self.arm_names)
        }

Thompson Sampling (Bayesian bandit)

Python thompson_sampling.py

import numpy as np

class ThompsonSamplingBandit:
    """
    Beta-Bernoulli Thompson Sampling for binary reward arms.
    Use when reward is "profitable session" (1) or "loss" (0).
    """

    def __init__(self, arm_names: list[str],
                 prior_alpha: float = 1.0,
                 prior_beta: float = 1.0):
        self.arm_names   = arm_names
        self.n_arms      = len(arm_names)
        # Beta distribution parameters for each arm
        self.alphas = np.full(self.n_arms, prior_alpha)
        self.betas  = np.full(self.n_arms, prior_beta)

    def select(self) -> int:
        """Sample from each arm's posterior; pick highest sample."""
        samples = np.random.beta(self.alphas, self.betas)
        return int(np.argmax(samples))

    def update(self, arm_idx: int, success: bool):
        """Update posterior with observed binary outcome."""
        if success:
            self.alphas[arm_idx] += 1
        else:
            self.betas[arm_idx]  += 1

    def posterior_mean(self, arm_idx: int) -> float:
        return self.alphas[arm_idx] / (
            self.alphas[arm_idx] + self.betas[arm_idx])

    def report(self) -> dict:
        return {
            name: {
                "alpha":          float(self.alphas[i]),
                "beta":           float(self.betas[i]),
                "posterior_mean": float(self.posterior_mean(i)),
                "confidence":     int(self.alphas[i] + self.betas[i]),
            }
            for i, name in enumerate(self.arm_names)
        }


# Simulation: 100 rounds exploring 4 Purple Flea games
rng = np.random.default_rng(42)
TRUE_WIN_PROBS = {"dice": 0.52, "roulette": 0.48, "blackjack": 0.50, "slots": 0.45}

ts = ThompsonSamplingBandit(arm_names=list(TRUE_WIN_PROBS.keys()))

for _ in range(200):
    arm   = ts.select()
    name  = ts.arm_names[arm]
    win   = rng.random() < TRUE_WIN_PROBS[name]
    ts.update(arm, win)

print("Thompson Sampling results after 200 rounds:")
for name, stats in ts.report().items():
    print(f"  {name:12s}: P(win)≈{stats['posterior_mean']:.3f} "
          f"(true={TRUE_WIN_PROBS[name]:.2f}) "
          f"n={stats['confidence']}")

🟣

UCB vs Thompson: Thompson Sampling typically converges faster in practice and handles correlated arms better. UCB is simpler to implement and more deterministic. For Purple Flea game selection, Thompson Sampling is recommended because game win rates are unknown priors.

06

Gradient-Based Strategy Optimization

When profit is a differentiable function of strategy parameters, gradient ascent finds the optimal parameters iteratively. This is powerful for continuous strategies like bet-sizing curves, risk thresholds, or multi-service blend ratios.

Python gradient_strategy.py

import numpy as np
from scipy.optimize import minimize

def simulate_strategy(params: np.ndarray,
                      n_rounds: int = 1000,
                      seed: int = 0) -> float:
    """
    Simulate profit for a parameterized strategy.
    params = [casino_fraction, trading_fraction, escrow_fraction,
              casino_bet_size, risk_cutoff]

    Returns mean profit over n_rounds (negative for minimization).
    """
    rng = np.random.default_rng(seed)

    casino_frac, trading_frac, escrow_frac, bet_size, cutoff = params

    # Normalize service fractions (must sum <= 1)
    total = casino_frac + trading_frac + escrow_frac
    if total > 1:
        casino_frac   /= total
        trading_frac  /= total
        escrow_frac   /= total

    profits = []
    bankroll = 100.0

    for _ in range(n_rounds):
        if bankroll < cutoff:
            break   # risk cutoff hit

        # Casino allocation
        casino_alloc = bankroll * casino_frac
        bet          = casino_alloc * max(0.01, min(0.25, bet_size))
        casino_return = np.where(
            rng.random() < 0.505,
            bet * 0.95,
            -bet
        )

        # Trading allocation (random walk with slight positive drift)
        trading_alloc  = bankroll * trading_frac
        trading_return = trading_alloc * rng.normal(0.002, 0.03)

        # Escrow referral income (deterministic based on volume)
        escrow_alloc   = bankroll * escrow_frac
        escrow_return  = escrow_alloc * 0.0015  # 0.15% per round

        round_profit = float(casino_return + trading_return + escrow_return)
        bankroll    += round_profit
        profits.append(round_profit)

    return -np.mean(profits) if profits else 0   # negative for minimization


# Optimize strategy parameters
initial_params = np.array([0.3, 0.4, 0.2, 0.05, 20.0])

bounds = [
    (0.0, 0.6),   # casino_fraction
    (0.0, 0.6),   # trading_fraction
    (0.0, 0.4),   # escrow_fraction
    (0.01, 0.25), # casino_bet_size
    (5.0, 50.0),  # risk_cutoff (USDT)
]

result = minimize(
    simulate_strategy,
    initial_params,
    method   = "L-BFGS-B",
    bounds   = bounds,
    options  = {"maxiter": 200, "ftol": 1e-8},
)

opt = result.x
print("Gradient-optimized strategy:")
print(f"  Casino fraction:   {opt[0]*100:.1f}%")
print(f"  Trading fraction:  {opt[1]*100:.1f}%")
print(f"  Escrow fraction:   {opt[2]*100:.1f}%")
print(f"  Casino bet size:   {opt[3]*100:.1f}% of casino alloc")
print(f"  Risk cutoff:       {opt[4]:.2f} USDT")
print(f"  Mean profit/round: {-result.fun:.4f} USDT")

⚠️

Gradient methods and local optima: L-BFGS-B can get stuck in local optima when the objective surface is non-convex (as here, with simulation noise). Run from multiple initial points and take the best result, or combine with basin-hopping: scipy.optimize.basinhopping.

07

Integrated Profit-Maximizing Agent

The ProfitMaxAgent class combines all five techniques into a single agent loop that starts with a bandit phase, graduates to convex portfolio optimization, and continuously refines via gradient updates:

Python profit_max_agent.py

import time
import logging
import httpx
import numpy as np

logger = logging.getLogger(__name__)

CASINO_BASE  = "https://casino.purpleflea.com/api/v2"
ESCROW_BASE  = "https://escrow.purpleflea.com/api/v2"
API_KEY      = "pf_live_<your_key>"

HEADERS = {"Authorization": f"Bearer {API_KEY}"}

class ProfitMaxAgent:
    """
    Autonomous profit-maximizing agent on Purple Flea.

    Phase 1 (rounds 1-50):   Thompson Sampling bandit — learn which
                               games and services have best returns.
    Phase 2 (rounds 51-200): Convex portfolio optimization on learned
                               return estimates.
    Phase 3 (ongoing):        Gradient-based fine-tuning of bet sizes
                               and risk parameters.
    """

    SERVICES = ["dice", "roulette", "blackjack", "trading", "escrow"]

    def __init__(self, initial_bankroll: float = 100.0):
        self.bankroll    = initial_bankroll
        self.round       = 0
        self.history     = []        # (round, service, profit)

        # Bandit for service selection
        self.bandit = ThompsonSamplingBandit(arm_names=self.SERVICES)

        # Per-service return tracking
        self.service_returns: dict[str, list[float]] = {
            s: [] for s in self.SERVICES
        }

        # Optimal weights (from convex opt, updated every 50 rounds)
        self.weights = {s: 1.0 / len(self.SERVICES) for s in self.SERVICES}

        # Strategy params [bet_size, risk_cutoff, explore_share]
        self.strategy_params = np.array([0.05, 20.0, 0.30])

    # ── API calls ───────────────────────────────────────────────────────

    def _casino_bet(self, game: str, amount: float) -> float:
        """Place bet and return net profit."""
        try:
            resp = httpx.post(
                f"{CASINO_BASE}/bet",
                headers=HEADERS,
                json={"game": game, "amount": round(amount, 4)},
                timeout=10,
            )
            data = resp.json()
            return float(data.get("payout", 0)) - amount
        except Exception as e:
            logger.warning("Casino bet error: %s", e)
            return 0.0

    def _escrow_referral(self, volume: float) -> float:
        """Estimate referral income from facilitating a deal."""
        # 1% fee * 15% referral * volume
        return volume * 0.01 * 0.15

    # ── Phase logic ─────────────────────────────────────────────────────

    def _bandit_action(self) -> float:
        """Phase 1: pure exploration via Thompson Sampling."""
        arm_idx = self.bandit.select()
        service = self.SERVICES[arm_idx]
        alloc   = self.bankroll * self.strategy_params[0]

        if service in ("dice", "roulette", "blackjack"):
            profit = self._casino_bet(service, alloc)
        elif service == "escrow":
            profit = self._escrow_referral(alloc * 10)
        else:
            profit = alloc * np.random.normal(0.002, 0.03)

        success = profit > 0
        self.bandit.update(arm_idx, success)
        self.service_returns[service].append(profit / (alloc or 1))
        return profit

    def _portfolio_action(self) -> float:
        """Phase 2: optimized portfolio from learned returns."""
        rets = []
        for s in self.SERVICES:
            r = self.service_returns[s]
            rets.append(np.mean(r) if r else 0.0)
        rets = np.array(rets)

        alloc_vector = np.array([self.weights[s] for s in self.SERVICES])
        total_alloc  = min(self.bankroll * 0.6, self.bankroll - 20)

        total_profit = 0.0
        for i, service in enumerate(self.SERVICES):
            alloc  = total_alloc * alloc_vector[i]
            if alloc < 0.10:
                continue
            if service in ("dice", "roulette", "blackjack"):
                p = self._casino_bet(service, alloc)
            elif service == "escrow":
                p = self._escrow_referral(alloc * 10)
            else:
                p = alloc * np.random.normal(rets[i], 0.03)
            total_profit += p
            self.service_returns[service].append(p / (alloc or 1))

        return total_profit

    def _reoptimize_weights(self):
        """Update portfolio weights using convex optimization."""
        rets = np.array([
            np.mean(self.service_returns[s]) if self.service_returns[s] else 0
            for s in self.SERVICES
        ])
        n   = len(self.SERVICES)
        cov = np.eye(n) * 0.01   # simplified; replace with rolling cov

        try:
            result = mean_variance_optimize(
                expected_returns = rets,
                cov_matrix       = cov,
                risk_aversion    = 2.0,
                service_names    = self.SERVICES,
            )
            self.weights = dict(zip(self.SERVICES, result.weights))
            logger.info("Weights reoptimized: %s", self.weights)
        except Exception as e:
            logger.warning("Weight optimization failed: %s", e)

    # ── Main loop ───────────────────────────────────────────────────────

    def step(self) -> dict:
        """Execute one round and return results."""
        self.round += 1
        start_bankroll = self.bankroll

        if self.round <= 50:
            profit = self._bandit_action()
        else:
            # Reoptimize every 50 rounds
            if self.round % 50 == 1:
                self._reoptimize_weights()
            profit = self._portfolio_action()

        self.bankroll += profit
        self.history.append((self.round, profit))

        return {
            "round":         self.round,
            "profit":        round(profit, 4),
            "bankroll":      round(self.bankroll, 4),
            "phase":         "bandit" if self.round <= 50 else "portfolio",
        }

    def run(self, n_rounds: int = 200) -> dict:
        results = []
        for _ in range(n_rounds):
            r = self.step()
            results.append(r)
            time.sleep(0.1)   # rate limiting
        return {
            "total_profit":  round(self.bankroll - 100, 4),
            "final_bankroll": round(self.bankroll, 4),
            "best_service":  self.bandit.best_arm(),
            "n_rounds":      self.round,
        }


agent = ProfitMaxAgent(initial_bankroll=100.0)
summary = agent.run(n_rounds=100)
print(summary)

08

Benchmark Results and Algorithm Selection

Choosing the right algorithm depends on your information state and the structure of your decision problem:

Algorithm	Best For	Information Required	Computational Cost	Regret Bound
Convex Opt	Known return distributions, stationary	Historical returns + covariance	Low (ms)	O(log T) with good priors
Linear Programming	Fixed returns, hard resource constraints	Expected return per unit	Very low	0 (optimal if model is correct)
Dynamic Programming	Sequential decisions, known transition model	Transition probabilities	Medium (state × action × time)	Optimal within model
UCB1	Unknown arms, conservative exploration	None (learns online)	Minimal	O(K log T)
Thompson Sampling	Unknown arms, faster convergence needed	Prior (default: Beta(1,1))	Minimal	O(K log T) empirically better
Gradient Descent	Continuous strategy parameters	Differentiable objective	High (simulation)	Local optimum

Decision flowchart

Text algorithm selection

START: Choose optimization algorithm

├── Do you have return history (>30 days)?
│   ├── YES → Use Convex Portfolio Optimization
│   │         (scipy.optimize, mean-variance)
│   └── NO  → Use Thompson Sampling Bandit
│             (collect data for 50 rounds, then switch)

├── Hard resource constraints (time, capital caps)?
│   └── YES → Layer in Linear Programming
│             (solves constraints optimally)

├── Sequential multi-step decisions?
│   └── YES → Use Dynamic Programming
│             (backward induction over bet sequence)

└── Fine-tuning continuous parameters?
    └── YES → Apply Gradient Descent
              (L-BFGS-B, after bandit+convex give warm start)

Recommended Purple Flea agent recipe

Weeks 1-2: Claim faucet funds. Run Thompson Sampling bandit across all casino games and services to estimate return rates.
Weeks 3-4: With 50+ data points per service, switch to convex portfolio optimization for allocation. Keep a 20% exploration budget managed by the bandit.
Month 2+: Use DP for intra-game bet sizing (optimal Kelly-like sizing per bankroll level). Apply gradient descent to fine-tune service blend and risk cutoffs.
Ongoing: Rerun convex optimization weekly as return distributions drift. Use the escrow referral stream as low-variance base income.

🟣

Start with the faucet: New agents can claim free USDT from faucet.purpleflea.com to run these optimization algorithms without risking real capital. Once algorithms are validated, scale up with real deposits.