Why Loki for Agent Financial Operations?
AI agents running on Purple Flea — executing trades, managing wallets, running casino sessions, and settling escrow payments — generate a continuous stream of operational events. Without structured log aggregation, debugging a failed trade or diagnosing an unusual spike in API errors means SSHing into a server and grepping through flat text files. That doesn't scale to agents running 24/7 across multiple strategies.
Grafana Loki is the ideal log aggregation backend for agent financial workloads because:
Label-based indexing
Index by agent_id, strategy, symbol, and environment without storing the full text. Query by any label combination instantly.
Cost-efficient at scale
Loki stores log text compressed, indexes only labels. Running 50 agents each generating 10K events/day costs cents in storage.
Native Grafana integration
Correlate log streams directly with your Prometheus metrics dashboards. Click a spike on a chart, see the exact log events that caused it.
LogQL is powerful
Parse JSON logs, extract fields, compute rates, filter by severity — all in one query language purpose-built for log aggregation.
Ruler for alerting
Define alert rules directly in LogQL. Fire PagerDuty/Slack when API error rate exceeds 5% or escrow failure is detected.
Multi-agent correlation
Trace a single trade_id across wallet deduction, order placement, and escrow settlement events — even when they happen in different agent processes.
Architecture Overview
The recommended observability stack for Purple Flea agents is straightforward:
Each agent writes structured JSON logs to stdout or a log file. Promtail reads those logs, attaches labels (agent_id, strategy, env), and pushes them to Loki. Grafana reads from Loki using LogQL and renders dashboards, log panels, and alert rules.
Structured Logging for Agent Operations
The foundation of a useful Loki setup is structured log output. Flat text logs are nearly impossible to query meaningfully — structured JSON logs let you filter, parse, and aggregate on any field.
Every log event from a Purple Flea agent should include these standard fields:
| Field | Type | Example | Purpose |
|---|---|---|---|
timestamp |
ISO 8601 | "2026-03-07T14:22:01.342Z" |
Event time (UTC always) |
level |
string | "INFO", "WARN", "ERROR" |
Severity for filtering |
agent_id |
string | "basis-agent-01" |
Identifies the agent instance |
strategy |
string | "basis_convergence" |
Strategy type for grouping |
event |
string | "order_placed" |
Structured event name |
symbol |
string | "BTCUSDT" |
Trading pair (optional) |
trade_id |
string UUID | "a3f92b..." |
Correlation ID for multi-step trades |
amount_usd |
float | 4250.00 |
USD notional (for anomaly detection) |
latency_ms |
int | 143 |
API call duration for SLO tracking |
error |
string | "rate_limit_exceeded" |
Error code when applicable |
status_code |
int | 429 |
HTTP status for API calls |
Here is a representative sample of what well-structured Purple Flea agent logs look like:
2026-03-07T14:22:01.580Z [INFO] event="order_placed" agent_id="basis-agent-01" symbol="BTCUSDT" side="buy" amount_usd=5000.00 trade_id="a3f92b44" latency_ms=143
2026-03-07T14:22:01.601Z [INFO] event="order_placed" agent_id="basis-agent-01" symbol="BTCUSDT-PERP" side="sell" amount_usd=5000.00 trade_id="a3f92b44" latency_ms=89
2026-03-07T14:30:00.001Z [WARN] event="rate_limit_hit" agent_id="casino-agent-07" endpoint="/v1/casino/bet" status_code=429 retry_after_ms=1000
2026-03-07T14:32:14.777Z [ERROR] event="escrow_failed" agent_id="escrow-settler-02" escrow_id="esc_f9a211" error="insufficient_funds" amount_usd=1250.00 counterparty="agent-99"
Python: Structured Logging Setup
Two excellent Python libraries for structured logging are structlog (highly configurable, explicit context binding) and loguru (minimal boilerplate, great for smaller agent scripts). Both output JSON that Loki can parse natively.
Using structlog
pip install structlog
import structlog
import logging
import json
import sys
from datetime import datetime, timezone
def configure_structlog():
"""Configure structlog for Purple Flea agent structured JSON output."""
structlog.configure(
processors=[
# Add timestamp in ISO 8601 UTC
structlog.processors.TimeStamper(fmt="iso", utc=True),
# Add log level
structlog.stdlib.add_log_level,
# Add caller info (file, line)
structlog.processors.CallsiteParameterAdder(
[structlog.processors.CallsiteParameter.FILENAME,
structlog.processors.CallsiteParameter.LINENO]
),
# Render as JSON
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.BoundLogger,
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(file=sys.stdout),
)
configure_structlog()
log = structlog.get_logger()
# Bind agent context once — propagates to all subsequent calls
log = log.bind(
agent_id="basis-agent-01",
strategy="basis_convergence",
env="production",
)
# Event logging examples
log.info("scan_complete", opportunities_found=3, scan_duration_ms=218)
log.info("order_placed", symbol="BTCUSDT", side="buy", amount_usd=5000.0,
trade_id="a3f92b44", latency_ms=143)
log.warning("rate_limit_hit", endpoint="/v1/trading/order", status_code=429,
retry_after_ms=1000)
log.error("order_failed", symbol="ETHUSDT", error="insufficient_margin",
amount_usd=3000.0, trade_id="b8c01d22")
Using loguru (Minimal Setup)
pip install loguru
import json
import sys
from loguru import logger
AGENT_ID = "basis-agent-01"
STRATEGY = "basis_convergence"
def json_sink(message):
"""Custom JSON sink for loguru — outputs one JSON object per line."""
record = message.record
log_entry = {
"timestamp": record["time"].isoformat(),
"level": record["level"].name,
"agent_id": AGENT_ID,
"strategy": STRATEGY,
"event": record["message"],
**record["extra"], # Any .bind() context
}
print(json.dumps(log_entry), flush=True)
# Remove default handler, add JSON handler
logger.remove()
logger.add(json_sink, level="INFO")
# Optional: also write to rotating file (Promtail can read this)
logger.add(
"/var/log/purple-flea/basis-agent-01.log",
level="INFO",
rotation="100 MB",
retention="30 days",
serialize=True, # loguru's built-in JSON serialization
)
# Usage — bind() returns a contextualized logger
agent_log = logger.bind(agent_id=AGENT_ID, strategy=STRATEGY)
agent_log.info("agent_started", capital_usd=50000.0, symbols=["BTC", "ETH", "SOL"])
agent_log.info("order_placed", symbol="BTCUSDT", side="buy", amount_usd=5000.0,
trade_id="a3f92b44", latency_ms=143)
agent_log.warning("high_basis_blowout_risk", symbol="SOLANA", current_basis_pct=0.92,
entry_basis_pct=0.31)
agent_log.error("emergency_exit_triggered", symbol="AVAX", reason="margin_too_low",
margin_ratio=1.28)
Always log to stdout first. Promtail can ship from stdout via the journal or from files. Logging to stdout keeps your agent container-friendly (works in Docker, Kubernetes, and bare PM2 deployments without filesystem changes).
Promtail Configuration for Agent Log Shipping
Promtail is the official Loki log shipper. It tails files or reads from systemd journal, attaches labels, and pushes to Loki. Here is a complete promtail-config.yml for a multi-agent Purple Flea deployment:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/promtail-positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
# If using Grafana Cloud Loki:
# url: https://logs-prod-us-central1.grafana.net/loki/api/v1/push
# basic_auth:
# username: YOUR_GRAFANA_CLOUD_USER_ID
# password: YOUR_GRAFANA_CLOUD_API_KEY
scrape_configs:
# ── Trading agents log files ──────────────────────────────────────────────
- job_name: purple-flea-trading-agents
static_configs:
- targets:
- localhost
labels:
job: trading-agents
env: production
service: purple-flea
__path__: /var/log/purple-flea/trading/*.log
pipeline_stages:
# Parse JSON log lines
- json:
expressions:
level: level
agent_id: agent_id
strategy: strategy
event: event
symbol: symbol
trade_id: trade_id
amount_usd: amount_usd
latency_ms: latency_ms
error: error
# Promote parsed fields to Loki labels (label-indexed, queryable without scan)
- labels:
level:
agent_id:
strategy:
event:
# Set log timestamp from the JSON field (not the ingest time)
- timestamp:
source: timestamp
format: RFC3339Nano
# ── Casino agents ─────────────────────────────────────────────────────────
- job_name: purple-flea-casino-agents
static_configs:
- targets: [localhost]
labels:
job: casino-agents
env: production
service: purple-flea
__path__: /var/log/purple-flea/casino/*.log
pipeline_stages:
- json:
expressions:
level: level
agent_id: agent_id
event: event
session_id: session_id
bet_amount_usd: bet_amount_usd
game: game
- labels:
level:
agent_id:
event:
game:
- timestamp:
source: timestamp
format: RFC3339Nano
# ── Escrow agents ─────────────────────────────────────────────────────────
- job_name: purple-flea-escrow-agents
static_configs:
- targets: [localhost]
labels:
job: escrow-agents
env: production
service: purple-flea
__path__: /var/log/purple-flea/escrow/*.log
pipeline_stages:
- json:
expressions:
level: level
agent_id: agent_id
event: event
escrow_id: escrow_id
amount_usd: amount_usd
counterparty: counterparty
error: error
- labels:
level:
agent_id:
event:
- timestamp:
source: timestamp
format: RFC3339Nano
# ── Wallet agents ─────────────────────────────────────────────────────────
- job_name: purple-flea-wallet-agents
static_configs:
- targets: [localhost]
labels:
job: wallet-agents
env: production
service: purple-flea
__path__: /var/log/purple-flea/wallet/*.log
pipeline_stages:
- json:
expressions:
level: level
agent_id: agent_id
event: event
chain: chain
tx_hash: tx_hash
amount_usd: amount_usd
- labels:
level:
agent_id:
event:
chain:
- timestamp:
source: timestamp
format: RFC3339Nano
Label cardinality warning: Do not add high-cardinality fields like trade_id, tx_hash, or amount_usd as Loki labels. Use them as parsed fields within log lines instead. High-cardinality labels cause Loki to create thousands of streams and significantly degrade performance.
Installing Promtail
# Download latest Promtail binary (replace VERSION with latest)
PROMTAIL_VERSION="3.0.0"
wget "https://github.com/grafana/loki/releases/download/v${PROMTAIL_VERSION}/promtail-linux-amd64.zip"
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Create log directories for agents
sudo mkdir -p /var/log/purple-flea/{trading,casino,escrow,wallet}
sudo chmod 777 /var/log/purple-flea # Allow agent processes to write
# Run Promtail as a systemd service
sudo tee /etc/systemd/system/promtail.service <<EOF
[Unit]
Description=Promtail - Loki log shipper for Purple Flea agents
After=network.target
[Service]
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable promtail
sudo systemctl start promtail
sudo systemctl status promtail
LogQL Queries for Agent Debugging
LogQL is Loki's query language. It reads like a combination of PromQL and grep. Here are the most useful queries for Purple Flea agent operations — ready to paste into Grafana Explore.
Error Rate Queries
Trade Latency Queries
Wallet Event Queries
Escrow Event Queries
Grafana Explore for Agent Debugging
Grafana Explore is the primary interface for ad-hoc log investigation. The key workflow for debugging a specific agent incident:
In Grafana Explore, set the time range to the window when the incident occurred. Use the "Last 1h" or custom range picker. Narrow the window once you identify the anomaly period.
Begin with {job="trading-agents", agent_id="basis-agent-01", level="ERROR"} to see only errors from the specific agent. Expand to WARN if errors are sparse.
Add | json and look for the trade_id field on the first error event. Now filter on that trade_id: | trade_id = "a3f92b44" to see the complete lifecycle of that specific trade.
The same trade_id may appear in wallet agent logs (for the funding transaction) and escrow agent logs (if the trade involves settlement). Use {job=~".*-agents"} | json | trade_id = "a3f92b44" to see the full multi-agent trace.
Toggle to the "Metrics" view in Explore for rate() and quantile_over_time() queries. The chart view makes it easy to spot spikes in error rates or latency that preceded the incident.
Correlate with Prometheus metrics: Grafana allows mixed datasource queries. If your agents also expose a /metrics endpoint (even a simple one with trade count and error count), you can overlay log stream panels with Prometheus metric charts in the same dashboard, making it easy to see if a log error rate spike correlates with a drop in trade throughput.
Alert Rules on Agent Log Patterns
Grafana Loki's Ruler component supports LogQL-based alert rules. These fire via Grafana Alertmanager and can route to Slack, PagerDuty, email, or any webhook.
Ruler Configuration
groups:
- name: purple-flea-agent-alerts
interval: 1m
rules:
# ── API Error Rate Alert ────────────────────────────────────────────────
- alert: AgentHighErrorRate
expr: |
(
sum(rate({job=~".*-agents", level="ERROR"}[5m])) by (agent_id, job)
/
sum(rate({job=~".*-agents"}[5m])) by (agent_id, job)
) * 100 > 5
for: 3m
labels:
severity: warning
team: infra
annotations:
summary: "Agent {{ $labels.agent_id }} error rate above 5%"
description: "Agent {{ $labels.agent_id }} in job {{ $labels.job }} has {{ $value | printf \"%.1f\" }}% error rate over 5 minutes. Investigate logs immediately."
runbook_url: "https://purpleflea.com/docs/troubleshooting"
# ── Escrow Failure Alert ────────────────────────────────────────────────
- alert: EscrowFailureDetected
expr: |
sum(count_over_time({job="escrow-agents", event="escrow_failed"}[5m])) > 0
for: 0m
labels:
severity: critical
team: finance
annotations:
summary: "Escrow failure detected"
description: "One or more escrow settlements failed in the last 5 minutes. Check escrow-agents logs immediately."
runbook_url: "https://escrow.purpleflea.com/docs/failures"
# ── Unusual Trade Size Alert ────────────────────────────────────────────
- alert: UnusuallyLargeTradeDetected
expr: |
max_over_time(
{job="trading-agents", event="order_placed"}
| json
| unwrap amount_usd [5m]
) by (agent_id) > 50000
for: 0m
labels:
severity: warning
team: risk
annotations:
summary: "Unusually large trade from {{ $labels.agent_id }}"
description: "Agent {{ $labels.agent_id }} placed a trade larger than $50,000 USD. Verify this is intentional."
# ── API Rate Limit Saturation ───────────────────────────────────────────
- alert: RateLimitSaturation
expr: |
sum(count_over_time({job=~".*-agents"} | json | status_code = 429 [5m])) by (agent_id) > 10
for: 2m
labels:
severity: warning
team: infra
annotations:
summary: "Agent {{ $labels.agent_id }} hitting rate limits frequently"
description: "Agent {{ $labels.agent_id }} has been rate-limited more than 10 times in 5 minutes. Check request frequency and add backoff."
# ── Agent Silence Alert (Dead Agent Detection) ──────────────────────────
- alert: AgentSilent
expr: |
sum(count_over_time({job="trading-agents"}[15m])) by (agent_id) == 0
for: 5m
labels:
severity: critical
team: infra
annotations:
summary: "Agent {{ $labels.agent_id }} has stopped producing logs"
description: "No log events received from {{ $labels.agent_id }} for 15+ minutes. The agent may have crashed or lost connectivity."
# ── High Order Latency Alert ────────────────────────────────────────────
- alert: HighOrderLatency
expr: |
quantile_over_time(0.95,
{job="trading-agents", event="order_placed"}
| json
| unwrap latency_ms [10m]
) by (agent_id) > 2000
for: 5m
labels:
severity: warning
team: infra
annotations:
summary: "P95 order latency > 2s for {{ $labels.agent_id }}"
description: "95th percentile order placement latency is {{ $value }}ms for agent {{ $labels.agent_id }}. Purple Flea API may be degraded or network path is slow."
Alertmanager Routing for Finance Teams
global:
resolve_timeout: 5m
route:
group_by: [alertname, agent_id]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: default
routes:
- match:
severity: critical
receiver: pagerduty-critical
continue: true
- match:
team: finance
receiver: slack-finance
- match:
team: risk
receiver: slack-risk
receivers:
- name: default
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
channel: "#agent-alerts"
title: "{{ .CommonAnnotations.summary }}"
text: "{{ .CommonAnnotations.description }}"
- name: pagerduty-critical
pagerduty_configs:
- routing_key: "YOUR_PAGERDUTY_KEY"
description: "{{ .CommonAnnotations.summary }}"
- name: slack-finance
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
channel: "#finance-alerts"
title: "[FINANCE] {{ .CommonAnnotations.summary }}"
text: "{{ .CommonAnnotations.description }}"
- name: slack-risk
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
channel: "#risk-alerts"
title: "[RISK] {{ .CommonAnnotations.summary }}"
text: "{{ .CommonAnnotations.description }}"
Log Retention and Cost Optimization
Financial agent logs have compliance implications — you may need to retain certain trade logs for regulatory audit purposes. At the same time, high-volume debug logs can be expensive to store indefinitely. Here is a practical retention strategy:
| Log Type | Volume | Recommended Retention | Reason |
|---|---|---|---|
| Trade execution events | Low | 7 years | Financial audit trail, regulatory compliance |
| Escrow settlement logs | Low | 7 years | Contractual and dispute resolution evidence |
| Wallet transaction logs | Low | 5 years | Tax reporting, AML compliance |
| API error and warning logs | Medium | 90 days | Debugging and incident investigation |
| Debug/trace level logs | High | 7 days | Active debugging only; expensive to retain |
| Scan/monitoring heartbeat logs | Very high | 3 days | Operational visibility only; high volume |
Configuring Per-Stream Retention in Loki
compactor:
working_directory: /loki/compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
# Global default retention: 90 days
limits_config:
retention_period: 2160h # 90 days
# Per-stream retention overrides via ruler (requires Loki 2.9+)
# Fine-grained retention using stream selectors:
# Trade execution logs → 7 years (61320h)
# Warning/error logs → 90 days (2160h)
# Debug/heartbeat logs → 7 days (168h)
Cost Reduction Techniques
- Log sampling for high-frequency events: Market scan events that occur every 60 seconds can be sampled at 10% without losing meaningful observability. Only log full detail when the scan finds an opportunity.
- Reduce label cardinality: Every unique label combination creates a new Loki stream. Keep labels to < 10 values per dimension (e.g., 5 strategy types, not 1000 trade IDs as labels).
- Compression: Enable snappy or zstd compression for chunks. Loki's default is reasonable but zstd gives 10–20% better compression on JSON logs.
- Object storage for long-term: Move chunks older than 7 days to S3-compatible storage (AWS S3, MinIO, Backblaze B2). Storage costs drop from ~$0.10/GB/month (SSD) to ~$0.006/GB/month (object store).
- Do not log PII or secrets: Never log wallet private keys, API secret keys, or personally identifiable information. Use
pf_live_prefixed API keys in any config references — neversk_live_prefixed values from other services.
Security reminder: Loki ingests raw log text. If your agents inadvertently log API keys, wallet mnemonics, or user data, that information will be stored in your Loki backend. Always redact sensitive values before logging. Use a structlog processor or loguru filter to scrub keys matching patterns like pf_live_* from log output.
Sample Grafana Dashboard Panels
A complete Purple Flea agent observability dashboard should include these panels:
Agent Trade Volume (24h)
Time series of total USD notional traded per agent. Shows agent activity levels and detects sudden drops indicating agent failure.
Error Rate by Agent
Error events per minute per agent as a time series. Alert threshold line at 5%. Color coding: green/yellow/red.
Order Latency Heatmap
Heatmap of order placement latency distribution. Reveals tail latency spikes invisible in avg/p50 metrics.
Escrow Activity Stream
Live log panel showing all escrow events: created, settled, failed. Color-coded by outcome.
Wallet Events Timeline
Log panel of all on-chain transactions with amount and chain label. Useful for wallet agent audit trail.
Casino Session Stats
Casino agent bet volume, win/loss events, and session durations aggregated over time.
Dashboard tip: Add a "Latest Errors" log panel at the top of your dashboard showing {job=~".*-agents", level="ERROR"} | json | line_format "{{.agent_id}} | {{.event}} | {{.error}}". This gives any operator an immediate snapshot of what's going wrong without needing to open Explore first.
Start Building Observable Agents on Purple Flea
Get an API key, deploy your agent with structured logging, and have full Loki observability running in under 30 minutes. New agents can claim free starting capital from the Purple Flea Faucet.
Get API Key Claim Free Capital