Multi-Model Agent Systems
A single LLM is a generalist. A multi-model agent system is a specialist team โ each model assigned to the tasks where it excels, orchestrated by a lightweight router that cuts costs by 70โ80% while improving output quality on high-stakes decisions. This guide walks through the architecture, the Python implementation, and how to wire it into Purple Flea's financial infrastructure.
A Python ModelRouter class that classifies incoming tasks by type and complexity, routes each to the optimal LLM (Claude Opus, GPT-4o, Gemini Flash, or local Llama), aggregates ensemble votes for high-stakes trades, and uses model disagreement as an uncertainty signal to pause or reduce position size.
Why Multi-Model Over Single-Model
The instinct to use the best available model for every task is costly and often counterproductive. A frontier model like Claude Opus 4 costs ~$15/M output tokens. A fast, cheap model like Gemini 1.5 Flash costs ~$0.075/M โ 200x cheaper. For most tasks inside a financial agent loop (JSON parsing, simple data transformations, routine API calls), the cheaper model performs identically.
Multi-model systems also expose something single-model systems cannot: disagreement. When three models agree on a trading decision, confidence is high. When they diverge, that divergence itself is a signal โ the situation is ambiguous, and the appropriate response is smaller position size or a human review flag.
Cost breakdown across model tiers (per million tokens)
Model Taxonomy for Financial Agents
Different financial agent tasks require different model capabilities. Matching task requirements to model strengths is the core skill of multi-model system design.
Claude Opus 4
- Complex reasoning chains
- Nuanced risk assessment
- Long-context financial docs
- Regulatory interpretation
GPT-4o
- Function calling reliability
- Structured JSON output
- Code generation (Python/JS)
- Tool use chains
Gemini 1.5 Flash
- High-throughput classification
- Simple sentiment scoring
- Data extraction tasks
- Routine summarization
Llama 3.1 8B (local)
- Zero-latency classification
- Private/offline tasks
- High-frequency routing
- No data egress
Task-to-model routing table
| Task Type | Complexity | Default Model | Escalate To |
|---|---|---|---|
| JSON parsing / extraction | Low | Gemini Flash | โ |
| News sentiment classification | Low | Llama 3.1 8B | Gemini Flash |
| Market summary generation | Medium | GPT-4o-mini | GPT-4o |
| Trading signal reasoning | High | Claude Sonnet | Claude Opus 4 |
| Risk analysis (large position) | Very High | Claude Opus 4 | Ensemble |
| Smart contract analysis | High | GPT-4o | Claude Opus 4 |
| Portfolio optimization | Very High | Ensemble (3 models) | Human review |
| Routine data transformation | Very Low | Llama 3.1 8B | โ |
Router Architecture
The router is a lightweight classifier that sits in front of all LLM calls. It accepts the task prompt and metadata (task type, urgency, dollar value at stake) and returns a model selection with an optional ensemble configuration.
Request Flow
Python ModelRouter Implementation
import asyncio
import httpx
import json
from dataclasses import dataclass
from enum import Enum
from typing import Any
class TaskType(Enum):
CLASSIFICATION = "classification"
EXTRACTION = "extraction"
SUMMARIZATION = "summarization"
REASONING = "reasoning"
CODE = "code"
RISK_ANALYSIS = "risk_analysis"
TRADING_SIGNAL = "trading_signal"
PORTFOLIO_OPT = "portfolio_optimization"
class Complexity(Enum):
TRIVIAL = 0
LOW = 1
MEDIUM = 2
HIGH = 3
CRITICAL = 4
@dataclass
class ModelSpec:
name: str
provider: str # "anthropic" | "openai" | "google" | "local"
model_id: str
cost_per_1m_in: float # USD
cost_per_1m_out: float
max_context: int
supports_json_mode: bool = True
latency_ms_p50: int = 500
MODEL_REGISTRY: dict[str, ModelSpec] = {
"llama-8b": ModelSpec(
name="Llama 3.1 8B",
provider="local",
model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
cost_per_1m_in=0.02, cost_per_1m_out=0.02,
max_context=128_000, latency_ms_p50=80
),
"gemini-flash": ModelSpec(
name="Gemini 1.5 Flash",
provider="google",
model_id="gemini-1.5-flash",
cost_per_1m_in=0.075, cost_per_1m_out=0.30,
max_context=1_000_000, latency_ms_p50=400
),
"gpt-4o-mini": ModelSpec(
name="GPT-4o-mini",
provider="openai",
model_id="gpt-4o-mini",
cost_per_1m_in=0.15, cost_per_1m_out=0.60,
max_context=128_000, latency_ms_p50=600
),
"gpt-4o": ModelSpec(
name="GPT-4o",
provider="openai",
model_id="gpt-4o",
cost_per_1m_in=2.50, cost_per_1m_out=10.0,
max_context=128_000, latency_ms_p50=1500
),
"claude-sonnet": ModelSpec(
name="Claude Sonnet 4",
provider="anthropic",
model_id="claude-sonnet-4-5",
cost_per_1m_in=3.0, cost_per_1m_out=15.0,
max_context=200_000, latency_ms_p50=1200
),
"claude-opus": ModelSpec(
name="Claude Opus 4",
provider="anthropic",
model_id="claude-opus-4-5",
cost_per_1m_in=15.0, cost_per_1m_out=75.0,
max_context=200_000, latency_ms_p50=3000
),
}
# Routing rules: (task_type, complexity) -> model_key
ROUTING_TABLE: dict[tuple, str | list[str]] = {
(TaskType.CLASSIFICATION, Complexity.TRIVIAL): "llama-8b",
(TaskType.CLASSIFICATION, Complexity.LOW): "gemini-flash",
(TaskType.EXTRACTION, Complexity.LOW): "gemini-flash",
(TaskType.EXTRACTION, Complexity.MEDIUM): "gpt-4o-mini",
(TaskType.SUMMARIZATION, Complexity.MEDIUM): "gpt-4o-mini",
(TaskType.SUMMARIZATION, Complexity.HIGH): "claude-sonnet",
(TaskType.REASONING, Complexity.MEDIUM): "gpt-4o",
(TaskType.REASONING, Complexity.HIGH): "claude-sonnet",
(TaskType.REASONING, Complexity.CRITICAL): "claude-opus",
(TaskType.CODE, Complexity.MEDIUM): "gpt-4o",
(TaskType.CODE, Complexity.HIGH): "gpt-4o",
(TaskType.RISK_ANALYSIS, Complexity.HIGH): "claude-sonnet",
(TaskType.RISK_ANALYSIS, Complexity.CRITICAL): ["claude-opus", "gpt-4o", "gemini-flash"],
(TaskType.TRADING_SIGNAL, Complexity.HIGH): "claude-sonnet",
(TaskType.TRADING_SIGNAL, Complexity.CRITICAL): ["claude-opus", "gpt-4o", "claude-sonnet"],
(TaskType.PORTFOLIO_OPT, Complexity.CRITICAL): ["claude-opus", "gpt-4o", "claude-sonnet"],
}
class ModelRouter:
def __init__(self, api_keys: dict[str, str]):
self.keys = api_keys
self.call_log: list[dict] = []
def select_model(self, task: TaskType, complexity: Complexity) -> str | list[str]:
"""Look up the routing table; fall back up the complexity ladder."""
for c in [complexity, Complexity(min(complexity.value + 1, 4))]:
key = (task, c)
if key in ROUTING_TABLE:
return ROUTING_TABLE[key]
return "claude-sonnet" # safe default
async def call_model(
self,
model_key: str,
prompt: str,
system: str = "",
json_mode: bool = False,
) -> dict:
spec = MODEL_REGISTRY[model_key]
result = {"model": model_key, "response": "", "tokens_in": 0, "tokens_out": 0}
if spec.provider == "anthropic":
result.update(await self._call_anthropic(spec, prompt, system, json_mode))
elif spec.provider == "openai":
result.update(await self._call_openai(spec, prompt, system, json_mode))
elif spec.provider == "google":
result.update(await self._call_google(spec, prompt, system))
elif spec.provider == "local":
result.update(await self._call_local(spec, prompt, system))
cost = (result["tokens_in"] * spec.cost_per_1m_in +
result["tokens_out"] * spec.cost_per_1m_out) / 1_000_000
result["cost_usd"] = cost
self.call_log.append(result)
return result
async def route(
self,
prompt: str,
task: TaskType,
complexity: Complexity,
system: str = "",
json_mode: bool = False,
) -> dict:
"""Route a task to the appropriate model(s) and return the result."""
model = self.select_model(task, complexity)
if isinstance(model, list):
# Ensemble: run all in parallel
tasks = [
self.call_model(m, prompt, system, json_mode)
for m in model
]
results = await asyncio.gather(*tasks)
return self._aggregate_ensemble(results, task)
else:
return await self.call_model(model, prompt, system, json_mode)
def _aggregate_ensemble(self, results: list[dict], task: TaskType) -> dict:
"""Aggregate ensemble results; return disagreement score as uncertainty."""
responses = [r.get("response", "") for r in results]
total_cost = sum(r.get("cost_usd", 0) for r in results)
if task in (TaskType.TRADING_SIGNAL, TaskType.RISK_ANALYSIS):
# Parse structured decisions from each model
decisions = []
for resp in responses:
try:
d = json.loads(resp) if isinstance(resp, str) else resp
decisions.append(d.get("decision", "hold"))
except Exception:
decisions.append("hold")
from collections import Counter
vote_counts = Counter(decisions)
majority_decision = vote_counts.most_common(1)[0][0]
agreement_rate = vote_counts.most_common(1)[0][1] / len(decisions)
uncertainty = 1.0 - agreement_rate
return {
"model": "ensemble",
"response": majority_decision,
"decisions": decisions,
"agreement_rate": agreement_rate,
"uncertainty": uncertainty,
"cost_usd": total_cost,
"should_pause": uncertainty > 0.5, # models disagree too much
}
# For other tasks, return the longest/most detailed response
best = max(results, key=lambda r: len(str(r.get("response", ""))))
best["cost_usd"] = total_cost
return best
Provider API Clients
Each provider requires a slightly different API structure. The router abstracts these behind a uniform interface.
async def _call_anthropic(self, spec: ModelSpec, prompt: str, system: str, json_mode: bool) -> dict:
payload = {
"model": spec.model_id,
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}],
}
if system:
payload["system"] = system
if json_mode:
payload["system"] = (payload.get("system", "") +
"\nRespond with valid JSON only. No markdown, no explanation.").strip()
async with httpx.AsyncClient() as c:
resp = await c.post(
"https://api.anthropic.com/v1/messages",
headers={
"x-api-key": self.keys["anthropic"],
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
json=payload, timeout=60.0
)
data = resp.json()
return {
"response": data["content"][0]["text"],
"tokens_in": data["usage"]["input_tokens"],
"tokens_out": data["usage"]["output_tokens"],
}
async def _call_openai(self, spec: ModelSpec, prompt: str, system: str, json_mode: bool) -> dict:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
payload = {"model": spec.model_id, "messages": messages, "max_tokens": 2048}
if json_mode:
payload["response_format"] = {"type": "json_object"}
async with httpx.AsyncClient() as c:
resp = await c.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": f"Bearer {self.keys['openai']}"},
json=payload, timeout=60.0
)
data = resp.json()
return {
"response": data["choices"][0]["message"]["content"],
"tokens_in": data["usage"]["prompt_tokens"],
"tokens_out": data["usage"]["completion_tokens"],
}
async def _call_google(self, spec: ModelSpec, prompt: str, system: str) -> dict:
full_prompt = f"{system}\n\n{prompt}" if system else prompt
async with httpx.AsyncClient() as c:
resp = await c.post(
f"https://generativelanguage.googleapis.com/v1beta/models/{spec.model_id}:generateContent",
params={"key": self.keys["google"]},
json={"contents": [{"parts": [{"text": full_prompt}]}]},
timeout=60.0
)
data = resp.json()
text = data["candidates"][0]["content"]["parts"][0]["text"]
usage = data.get("usageMetadata", {})
return {
"response": text,
"tokens_in": usage.get("promptTokenCount", 0),
"tokens_out": usage.get("candidatesTokenCount", 0),
}
async def _call_local(self, spec: ModelSpec, prompt: str, system: str) -> dict:
"""Call local Ollama or vLLM instance."""
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
async with httpx.AsyncClient() as c:
resp = await c.post(
"http://localhost:11434/api/chat",
json={"model": "llama3.1:8b", "messages": messages, "stream": False},
timeout=120.0
)
data = resp.json()
return {
"response": data["message"]["content"],
"tokens_in": data.get("prompt_eval_count", 0),
"tokens_out": data.get("eval_count", 0),
}
Ensemble Decision Making
For high-stakes financial decisions, running the same prompt through multiple models and aggregating their votes is both more accurate and more auditable than a single model call. Model disagreement is a first-class signal.
TRADE_DECISION_PROMPT = """
You are a financial AI agent making a trading decision.
Asset: {symbol}
Current price: ${price}
24h change: {change_24h}%
NLP sentiment score: {nlp_score} (-1 bearish to +1 bullish)
RSI (14): {rsi}
Position size limit: ${max_position} USDC
Based on this data, decide whether to BUY, SELL, or HOLD.
Respond with JSON only:
{{
"decision": "buy|sell|hold",
"confidence": 0.0-1.0,
"size_usdc": numeric,
"reasoning": "one sentence"
}}
"""
async def ensemble_trade_decision(
router: ModelRouter,
symbol: str,
market_data: dict,
nlp_score: float,
) -> dict:
prompt = TRADE_DECISION_PROMPT.format(
symbol=symbol,
price=market_data["price"],
change_24h=market_data["change_24h"],
nlp_score=round(nlp_score, 3),
rsi=market_data.get("rsi", 50),
max_position=500,
)
result = await router.route(
prompt=prompt,
task=TaskType.TRADING_SIGNAL,
complexity=Complexity.CRITICAL,
json_mode=True,
)
print(f"[ENSEMBLE] Decision: {result['response']}")
print(f"[ENSEMBLE] Agreement: {result.get('agreement_rate', 1.0):.0%}")
print(f"[ENSEMBLE] Uncertainty: {result.get('uncertainty', 0):.2f}")
print(f"[ENSEMBLE] Should pause: {result.get('should_pause', False)}")
print(f"[ENSEMBLE] Cost: ${result.get('cost_usd', 0):.4f}")
return result
If the ensemble uncertainty exceeds 0.5 (less than 50% agreement), it indicates the market situation is genuinely ambiguous. The agent should either reduce position size by 50% or flag for human review rather than defaulting to the majority vote.
Claude vs. GPT-4 vs. Gemini vs. Llama: Financial Task Benchmarks
Different models excel at different financial subtasks. These benchmarks reflect internal Purple Flea testing across 1,000+ financial prompts in early 2026.
| Task | Claude Opus 4 | GPT-4o | Gemini Flash | Llama 3.1 8B |
|---|---|---|---|---|
| Multi-step risk reasoning | 95% | 88% | 61% | 54% |
| JSON extraction accuracy | 97% | 98% | 94% | 88% |
| Earnings transcript analysis | 94% | 89% | 71% | 62% |
| Python code gen (financial) | 91% | 94% | 72% | 76% |
| News sentiment classification | 89% | 87% | 85% | 80% |
| Portfolio optimization | 92% | 85% | 58% | 49% |
| Simple data categorization | 96% | 95% | 93% | 89% |
| Regulatory text interpretation | 93% | 84% | 60% | 51% |
The key insight: for simple classification tasks (last two rows), the quality gap between frontier and cheap models is small (6โ10%), but the cost gap is 750x. Route aggressively to cheaper models for routine work.
LLM Orchestration Patterns
Beyond simple routing, multi-model systems support several powerful orchestration patterns for complex financial workflows.
Sequential Pipeline (Cheap โ Expensive)
A cheap model does a first-pass filter or draft. Only if it returns low confidence does the task escalate to a more expensive model. Reduces frontier model usage by 60โ80% for tasks with high easy-case rates.
Specialization with Fusion
Route sub-components of a complex task to specialist models (e.g., GPT-4o for code, Claude for reasoning, Gemini for summarization) and fuse the outputs with a final model call. Beats any single model on multi-faceted tasks.
Adversarial Debate
One model argues for a trade, another argues against. A judge model evaluates the debate. Useful for high-conviction decisions where confirmation bias is a risk. Increases cost 3x but catches errors single-model misses.
Self-Consistency Voting
Run the same prompt through the same model multiple times at temperature > 0. Aggregate the responses. Effective when only one model is available but reliability needs improvement โ typically 3โ5 samples.
async def sequential_escalation(
router: ModelRouter,
prompt: str,
task: TaskType,
confidence_threshold: float = 0.75,
) -> dict:
"""
Run cheap model first; escalate to expensive only if confidence is low.
Assumes model response includes a confidence field.
"""
# Stage 1: cheap model
result = await router.route(prompt, task, Complexity.LOW, json_mode=True)
try:
data = json.loads(result["response"])
confidence = float(data.get("confidence", 0))
except Exception:
confidence = 0.0
if confidence >= confidence_threshold:
print(f"[ESCALATION] Cheap model confident ({confidence:.0%}), no escalation")
result["escalated"] = False
return result
# Stage 2: escalate
print(f"[ESCALATION] Low confidence ({confidence:.0%}), escalating to frontier model")
result = await router.route(prompt, task, Complexity.CRITICAL, json_mode=True)
result["escalated"] = True
return result
async def adversarial_debate(
router: ModelRouter,
asset: str,
trade_thesis: str,
) -> dict:
"""Bull/bear debate between two models; judge decides."""
bull_prompt = f"Argue STRONGLY for buying {asset}. Thesis to defend: {trade_thesis}. Be concise, 3 bullet points."
bear_prompt = f"Argue STRONGLY against buying {asset}. Counter this thesis: {trade_thesis}. Be concise, 3 bullet points."
bull, bear = await asyncio.gather(
router.call_model("claude-sonnet", bull_prompt),
router.call_model("gpt-4o", bear_prompt),
)
judge_prompt = f"""
Bull case:
{bull['response']}
Bear case:
{bear['response']}
As an impartial judge, evaluate both arguments and return JSON:
{{"winner": "bull|bear|draw", "confidence": 0.0-1.0, "key_reason": "one sentence"}}
"""
judgment = await router.call_model("claude-opus", judge_prompt, json_mode=True)
total_cost = bull["cost_usd"] + bear["cost_usd"] + judgment["cost_usd"]
return {
"judgment": json.loads(judgment["response"]),
"bull_argument": bull["response"],
"bear_argument": bear["response"],
"total_cost_usd": total_cost,
}
Purple Flea Integration Example
The following shows a complete integration: the multi-model router combined with Purple Flea's trading API to form a decision-making agent that uses model disagreement to size positions.
import asyncio
import httpx
from dataclasses import dataclass
PURPLE_FLEA_API = "https://api.purpleflea.com"
@dataclass
class AgentConfig:
pf_api_key: str = "pf_live_<your_key>"
anthropic_key: str = ""
openai_key: str = ""
google_key: str = ""
base_position_usd: float = 100.0
max_position_usd: float = 500.0
async def run_decision_cycle(config: AgentConfig, symbol: str):
"""Full cycle: fetch market data โ multi-model decision โ execute trade."""
router = ModelRouter({
"anthropic": config.anthropic_key,
"openai": config.openai_key,
"google": config.google_key,
})
# 1. Fetch market data from Purple Flea
async with httpx.AsyncClient() as c:
resp = await c.get(
f"{PURPLE_FLEA_API}/v1/market/summary/{symbol}",
headers={"Authorization": f"Bearer {config.pf_api_key}"}
)
market = resp.json()
# 2. Cheap model: classify market regime
regime_prompt = f"""
Price: {market['price']}, RSI: {market['rsi']}, Volume ratio: {market['volume_ratio']:.2f}
Classify market regime: trending_up | trending_down | ranging | volatile
JSON only: {{"regime": "...", "confidence": 0.0-1.0}}
"""
regime = await router.route(
regime_prompt, TaskType.CLASSIFICATION, Complexity.LOW, json_mode=True
)
regime_data = json.loads(regime["response"])
print(f"[REGIME] {regime_data['regime']} ({regime_data['confidence']:.0%}) | cost: ${regime['cost_usd']:.5f}")
# 3. High-stakes: ensemble trade decision
ensemble = await ensemble_trade_decision(router, symbol, market, nlp_score=0.0)
if ensemble.get("should_pause"):
print("[PAUSED] High model disagreement โ no trade")
return
decision_data = json.loads(ensemble["response"]) if isinstance(ensemble["response"], str) else {}
decision = decision_data.get("decision", "hold")
model_confidence = decision_data.get("confidence", 0.5)
ensemble_agreement = ensemble.get("agreement_rate", 1.0)
if decision == "hold":
print("[HOLD] Ensemble decided to hold")
return
# 4. Scale position by (model_confidence * ensemble_agreement)
conviction = model_confidence * ensemble_agreement
position_usd = config.base_position_usd + (
(config.max_position_usd - config.base_position_usd) * conviction
)
print(f"[TRADE] {decision.upper()} {symbol} | size=${position_usd:.0f} | conviction={conviction:.0%}")
# 5. Execute via Purple Flea
async with httpx.AsyncClient() as c:
trade_resp = await c.post(
f"{PURPLE_FLEA_API}/v1/trade/order",
headers={"Authorization": f"Bearer {config.pf_api_key}"},
json={
"symbol": symbol,
"side": decision,
"amount_usdc": position_usd,
"order_type": "market",
"source": "multi_model_router",
"metadata": {
"regime": regime_data["regime"],
"ensemble_agreement": ensemble_agreement,
"models_used": ["claude-opus", "gpt-4o", "claude-sonnet"],
"total_llm_cost_usd": ensemble.get("cost_usd", 0),
}
},
timeout=15.0
)
print(f"[ORDER] {trade_resp.json()}")
# Run
config = AgentConfig(
pf_api_key="pf_live_<your_key>",
anthropic_key="sk-ant-...",
openai_key="sk-...",
google_key="AI...",
)
asyncio.run(run_decision_cycle(config, "BTC"))
Cost Optimization in Practice
A well-tuned multi-model system running 1,000 decision cycles per day can cost under $5/day in LLM fees by routing aggressively to cheap models. Here is a real breakdown from a Purple Flea test agent over 30 days:
| Model | Calls / Day | Avg Tokens | Cost / Day | % of Calls |
|---|---|---|---|---|
| Llama 3.1 8B (local) | 620 | 180 | $0.002 | 62% |
| Gemini Flash | 210 | 320 | $0.016 | 21% |
| GPT-4o-mini | 100 | 500 | $0.038 | 10% |
| Claude Sonnet | 55 | 800 | $0.66 | 5.5% |
| Claude Opus (ensemble) | 15 | 1,200 | $1.35 | 1.5% |
| Total | 1,000 | โ | $2.07 | 100% |
The same workload on Claude Opus for every call would cost ~$90/day. Multi-model routing achieves a 97.7% cost reduction with only a minor quality degradation on the routine tasks.
Monitoring and Observability
class RouterObservability:
"""Lightweight cost and quality tracking for the multi-model router."""
def __init__(self):
self.totals: dict[str, dict] = {}
def record(self, result: dict):
model = result.get("model", "unknown")
if model not in self.totals:
self.totals[model] = {"calls": 0, "cost_usd": 0, "tokens_in": 0, "tokens_out": 0}
t = self.totals[model]
t["calls"] += 1
t["cost_usd"] += result.get("cost_usd", 0)
t["tokens_in"] += result.get("tokens_in", 0)
t["tokens_out"] += result.get("tokens_out", 0)
def report(self) -> dict:
total_cost = sum(v["cost_usd"] for v in self.totals.values())
total_calls = sum(v["calls"] for v in self.totals.values())
return {
"total_cost_usd": round(total_cost, 4),
"total_calls": total_calls,
"cost_per_call": round(total_cost / max(total_calls, 1), 6),
"by_model": {
m: {**v, "pct_of_cost": f"{v['cost_usd']/max(total_cost,0.001)*100:.1f}%"}
for m, v in self.totals.items()
}
}
Conclusion
Multi-model agent systems represent the next evolution in financial AI: not a single generalist model making all decisions, but a coordinated team of specialists, routers, and ensemble voters โ each component tuned for cost, speed, and accuracy on its specific task.
The patterns described here โ sequential escalation, adversarial debate, disagreement-based uncertainty, and cost-tier routing โ can cut LLM operating costs by 90%+ while improving decision quality on high-stakes trades by adding model diversity as a risk management tool.
Get your Purple Flea API key at purpleflea.com/register. New agents can use the Faucet to claim testnet funds and paper-trade multi-model decisions before going live. The NLP trading signals guide pairs well with the multi-model router for a complete autonomous trading system.