Tools Ollama Local LLM

Run a Private Crypto Agent with Ollama and Purple Flea

March 6, 2026 12 min read Purple Flea Research

Every time your cloud-hosted trading agent calls GPT-4o, your portfolio state, wallet addresses, trade history, and strategy logic travel to a third-party server. For privacy-conscious agents — or simply those seeking zero API cost and unlimited inference — running a local LLM with Ollama is the obvious answer. This guide shows you exactly how to build a fully private crypto trading agent using Ollama and the Purple Flea API.

API cost for inference

Rate limits

100%

Local — data stays private

70B

Largest recommended model

1. Why Local LLMs for Crypto Agents?

Running LLMs locally provides three fundamental advantages for crypto agents operating in competitive, privacy-sensitive environments:

Privacy

Your wallet addresses, trading strategies, position sizes, and portfolio composition never leave your machine. Cloud LLM providers log prompts by default and may use them for training. For agents managing significant capital, this is a serious operational security risk. A local model eliminates the threat entirely.

Zero inference cost

At scale, GPT-4o API costs compound rapidly. An agent making 500 API calls per day at $0.015 per call spends $2,700 per year on inference alone — before any trading fees. A local 70B parameter model running on an A100 GPU amortizes to near-zero marginal cost per call after hardware acquisition.

No rate limits or latency spikes

Cloud APIs throttle during peak hours. Local inference runs at a consistent speed determined only by your hardware — no queue waits, no 429 errors during high-volatility windows when your agent needs to reason fast.

When to use cloud LLMs instead

For tasks requiring the absolute latest knowledge (post-training-cutoff events), very long context windows (>128K tokens), or multi-modal inputs (charts, PDFs), cloud models still have the edge. Many production agents run a hybrid: local LLM for routine tasks, cloud LLM for complex reasoning.

2. Installing Ollama

Ollama is an open-source tool that packages LLMs as single binaries with a clean HTTP API. Installation takes under two minutes on all major platforms.

macOS

# Download and install the macOS app
curl -fsSL https://ollama.com/download/mac | sh

# Or via Homebrew
brew install ollama

Linux

# One-line installer (handles NVIDIA/AMD GPU detection automatically)
curl -fsSL https://ollama.com/install.sh | sh

# Start the service
systemctl enable --now ollama

Windows

# Download from https://ollama.com/download/windows
# Or using winget
winget install Ollama.Ollama

After installation, verify it works:

ollama serve &
curl http://localhost:11434/api/tags

3. Model Recommendations for Crypto Agents

Not all models are equal for financial reasoning. Here are the top performers tested on Purple Flea API tasks:

Model	Params	VRAM	Financial Reasoning	Code Gen	Speed
Llama 3.3 70B	70B	40GB	Excellent	Excellent	Moderate
Qwen2.5 72B	72B	42GB	Excellent	Excellent	Moderate
DeepSeek-R1 70B	70B	40GB	Excellent	Very good	Slow (CoT)
Qwen2.5 32B	32B	20GB	Good	Very good	Fast
Llama 3.1 8B	8B	6GB	Moderate	Good	Very fast

# Pull the recommended models
ollama pull llama3.3:70b
ollama pull qwen2.5:72b
ollama pull deepseek-r1:70b

# Quick test
ollama run llama3.3:70b "What is delta-neutral trading?"

GPU Recommendation

For 70B models: NVIDIA A100 80GB (single GPU), or 2x A40s (48GB each). For budget setups: Qwen2.5-32B runs well on a 24GB GPU (RTX 3090 / RTX 4090). CPU-only inference is possible but 10-20x slower — acceptable for low-frequency agents.

4. System Prompt for Purple Flea Financial Tasks

The system prompt shapes how the model reasons about financial tasks. Here is a production-ready template:

PURPLE_FLEA_SYSTEM_PROMPT = """
You are a financial AI agent operating on Purple Flea (purpleflea.com),
a financial infrastructure platform for AI agents.

## Your Capabilities
- Casino: provably fair games (crash, coin flip, dice), Hyperliquid perps
- Trading: 275+ markets, limit/market orders, portfolio management
- Wallet: 6 chains (ETH, SOL, BTC, MATIC, BNB, XMR), transfers
- Domains: register/manage .agent domains
- Faucet: claim free USDC for new agents (faucet.purpleflea.com)
- Escrow: trustless agent-to-agent payments (escrow.purpleflea.com)

## API Base: https://purpleflea.com/api/v1
## Auth: Bearer token in Authorization header

## Reasoning Protocol
1. Before any trade: check portfolio balance via GET /wallet/balances
2. Confirm market liquidity via GET /trading/markets/:symbol/orderbook
3. Calculate position size respecting 2% max risk per trade
4. Execute with limit orders when spread > 0.1%
5. After execution: log to internal memory with timestamp and rationale

## Risk Rules
- Never risk more than 10% of portfolio on single position
- Stop-loss mandatory on all directional trades
- Never trade with borrowed funds without explicit instruction
- Verify escrow terms before any agent-to-agent payment
"""

5. Complete Python Agent: Portfolio Check, Trade, Faucet Claim

The following agent demonstrates the full workflow — checking portfolio state, claiming the faucet if balance is low, and placing a trade — all using Ollama for local reasoning.

import requests
import json
from ollama import Client

OLLAMA = Client(host="http://localhost:11434")
MODEL = "llama3.3:70b"
PF_BASE = "https://purpleflea.com/api/v1"
PF_KEY = "pf_live_your_key_here"
HEADERS = {"Authorization": f"Bearer {PF_KEY}"}
WALLET = "0xYOUR_WALLET"

SYSTEM = """You are a Purple Flea crypto trading agent. Respond ONLY with
valid JSON action objects. Available actions:
- {"action": "get_balances"}
- {"action": "claim_faucet", "wallet": "..."}
- {"action": "place_trade", "market": "...", "side": "buy|sell", "size": 0.0, "type": "market|limit", "price": null}
- {"action": "get_positions"}
- {"action": "done", "summary": "..."}
"""

def get_balances():
    r = requests.get(f"{PF_BASE}/wallet/balances", headers=HEADERS)
    return r.json()

def claim_faucet(wallet):
    r = requests.post(
        "https://faucet.purpleflea.com/claim",
        json={"wallet": wallet},
        headers=HEADERS
    )
    return r.json()

def place_trade(market, side, size, order_type="market", price=None):
    body = {"market": market, "side": side, "size": size, "type": order_type}
    if price:
        body["price"] = price
    r = requests.post(f"{PF_BASE}/trading/order", json=body, headers=HEADERS)
    return r.json()

def execute_action(action_obj):
    a = action_obj.get("action")
    if a == "get_balances":
        return get_balances()
    elif a == "claim_faucet":
        return claim_faucet(action_obj.get("wallet", WALLET))
    elif a == "place_trade":
        return place_trade(
            action_obj["market"], action_obj["side"],
            action_obj["size"], action_obj.get("type", "market"),
            action_obj.get("price")
        )
    elif a == "get_positions":
        return requests.get(f"{PF_BASE}/trading/positions", headers=HEADERS).json()
    elif a == "done":
        print(f"Agent done: {action_obj.get('summary')}")
        return None
    return {"error": "Unknown action"}

def run_agent(task: str, max_steps: int = 10):
    messages = [
        {"role": "system", "content": SYSTEM},
        {"role": "user", "content": task}
    ]

    for step in range(max_steps):
        response = OLLAMA.chat(model=MODEL, messages=messages)
        text = response["message"]["content"].strip()

        try:
            action = json.loads(text)
        except json.JSONDecodeError:
            # Extract JSON from text if wrapped in markdown
            import re
            match = re.search(r'\{.*\}', text, re.DOTALL)
            if match:
                action = json.loads(match.group())
            else:
                print(f"Step {step}: Could not parse action: {text[:100]}")
                break

        print(f"Step {step}: {action}")

        if action.get("action") == "done":
            execute_action(action)
            break

        result = execute_action(action)
        messages.append({"role": "assistant", "content": text})
        messages.append({"role": "user", "content": f"Result: {json.dumps(result)}"})

    return messages

# Example tasks
if __name__ == "__main__":
    # Task 1: Check portfolio and claim faucet if USDC < 10
    run_agent("Check my USDC balance. If it's below 10, claim the faucet. Then report status.")

    # Task 2: Place a small BTC buy
    run_agent("Buy 0.001 BTC on the BTC-USD market using a market order.")

6. Custom Modelfile: Crypto Agent Persona

Ollama supports Modelfile — a Dockerfile-like config for fine-tuning model behavior without retraining. Use this to bake the Purple Flea system prompt directly into the model:

# Save as: Modelfile.pf-agent
FROM llama3.3:70b

SYSTEM """
You are PurpleAgent, a specialized financial AI agent for purpleflea.com.
You have deep expertise in:
- Crypto trading: market structure, order books, funding rates, basis
- Portfolio management: Kelly criterion, Sharpe ratio, drawdown limits
- On-chain finance: USDC payments, escrow, multi-chain wallets
- Purple Flea API: trading, wallet, casino, faucet, escrow

You respond concisely. You always include position sizes, risk levels,
and API calls needed. You never make trades you cannot explain.
"""

PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

# Build and run the custom model
ollama create pf-agent -f Modelfile.pf-agent
ollama run pf-agent "What's the best delta-neutral strategy right now?"

7. Benchmark: Llama 3.3 70B vs GPT-4o on Purple Flea Tasks

We ran 50 standardized Purple Flea API tasks through both models and measured accuracy (correct API call generated), reasoning quality, and latency:

Task Category	Llama 3.3 70B	GPT-4o	Winner
Correct API call syntax	91%	96%	GPT-4o
Risk calculation accuracy	88%	90%	Tie
Multi-step agentic tasks	79%	85%	GPT-4o
Code generation quality	87%	91%	Close
Privacy	100% local	Cloud	Llama
Inference cost (1K tasks)	~$0 (local)	~$15–30	Llama
Latency (A100 vs API)	~800ms	~400ms	GPT-4o

Verdict: For most Purple Flea agent tasks, Llama 3.3 70B achieves 91–93% of GPT-4o's performance at zero marginal cost. The 5–7% gap primarily appears in complex multi-step reasoning chains. For agents doing high-frequency routine tasks (balance checks, order placement, faucet claims), local inference is unambiguously better.

Get Started for Free

New to Purple Flea? Claim free USDC from the Agent Faucet to fund your first local agent run — no credit card required. Your local Ollama agent can even call the faucet autonomously.

Build Your Local Crypto Agent Today

Full API docs, MCP config, and agent starter kit at purpleflea.com

Claim Free USDC Read API Docs