Build a Crypto Trading Agent with NVIDIA NIM in 50 Lines of Python

Introduction

NVIDIA NIM (NVIDIA Inference Microservices) lets you run production-grade LLMs on your own GPU infrastructure with an OpenAI-compatible API. Combined with Purple Flea's trading API, you get a fully autonomous, privacy-preserving trading agent where your positions, strategies, and alpha never leave your own hardware.

No cloud API costs, no rate limits, no latency spikes at 3am when a market moves. This guide shows you how to wire up NIM's local inference endpoint to Purple Flea's perpetual futures trading API in roughly 50 lines of Python.

Why Local Inference for Trading Agents

Cloud LLMs work fine for most agent tasks, but trading has unique requirements that push the tradeoffs in favor of local inference:

No cloud API latency: Local inference adds sub-5ms overhead vs 200-800ms round-trip to OpenAI's servers. When a liquidation cascade is in progress, milliseconds matter.
Strategy stays private: Your prompts, market context, and position logic never leave your hardware. No AI provider can observe your trading patterns or sell aggregated signal data.
Consistent throughput: No shared-infrastructure rate limits. Your agent can run inference loops as fast as your GPU allows — critical during high-volatility periods when you want faster decision cycles.
Run multiple models in parallel: Dedicated hardware lets you run a fast model for execution decisions and a slower, larger model for risk analysis simultaneously.

Prerequisites

NVIDIA GPU (A10G, A100, or H100 recommended; RTX 3090/4090 works for smaller models)
NVIDIA NIM installed and running locally — see NVIDIA NIM docs
Purple Flea API key from /api-keys
Python 3.10+ with openai and requests installed
A model pulled into NIM — meta/llama-3.1-70b-instruct is recommended for trading

Quick start: Install dependencies with pip install openai requests. NIM exposes an OpenAI-compatible endpoint at http://localhost:8000/v1 by default — no SDK changes needed.

The Tool Schema

Define Purple Flea's trading capabilities as OpenAI-compatible function tools. The model will decide when and how to call each one:

get_price

Fetch current mark price and 24h change for any perpetual market

open_long

Open a leveraged long position with configurable size in USD

close_trade

Close an existing position by trade ID, full or partial

get_portfolio

Return current open positions, unrealized PnL, and margin usage

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_price",
            "description": "Get current price for a perpetual market",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string", "description": "e.g. BTC-PERP, ETH-PERP"}
                },
                "required": ["symbol"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "open_long",
            "description": "Open a long position on a perpetual futures market",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string"},
                    "size_usd": {"type": "number", "description": "Position size in USD"},
                    "leverage": {"type": "number", "description": "Leverage multiplier, 1-10"}
                },
                "required": ["symbol", "size_usd"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_portfolio",
            "description": "Get current portfolio: open positions, PnL, margin",
            "parameters": {"type": "object", "properties": {}}
        }
    },
    {
        "type": "function",
        "function": {
            "name": "close_trade",
            "description": "Close an open position",
            "parameters": {
                "type": "object",
                "properties": {
                    "trade_id": {"type": "string"},
                    "close_pct": {"type": "number", "description": "0-100, percentage to close"}
                },
                "required": ["trade_id"]
            }
        }
    }
]

The Full Agent Code

Here is the complete trading agent — NIM handles the LLM reasoning, Purple Flea handles execution:

from openai import OpenAI
import requests
import json

# NIM runs locally on OpenAI-compatible endpoint
nim_client = OpenAI(base_url="http://localhost:8000/v1", api_key="nim-local")
PURPLE_FLEA_KEY = "your-pf-api-key"
PF_BASE = "https://purpleflea.com/api/v1"
HEADERS = {"Authorization": f"Bearer {PURPLE_FLEA_KEY}", "Content-Type": "application/json"}

def call_purple_flea(name: str, args: dict) -> dict:
    """Execute a Purple Flea API call based on tool name."""
    if name == "get_price":
        r = requests.get(f"{PF_BASE}/markets/{args['symbol']}/price", headers=HEADERS)
        return r.json()
    elif name == "open_long":
        payload = {
            "symbol": args["symbol"],
            "side": "long",
            "size": args["size_usd"],
            "leverage": args.get("leverage", 2)
        }
        r = requests.post(f"{PF_BASE}/trade", json=payload, headers=HEADERS)
        return r.json()
    elif name == "get_portfolio":
        r = requests.get(f"{PF_BASE}/portfolio", headers=HEADERS)
        return r.json()
    elif name == "close_trade":
        payload = {"trade_id": args["trade_id"], "close_pct": args.get("close_pct", 100)}
        r = requests.post(f"{PF_BASE}/trade/close", json=payload, headers=HEADERS)
        return r.json()
    return {"error": "unknown tool"}

def run_trading_agent():
    messages = [
        {
            "role": "system",
            "content": (
                "You are an autonomous crypto trading agent running on Purple Flea's "
                "perpetual futures exchange. Your goal: maximize returns while keeping "
                "max drawdown under 10%. Check BTC-PERP price, review your portfolio, "
                "and make a trading decision. Be concise and decisive."
            )
        },
        {
            "role": "user",
            "content": "Analyze the market and take action if conditions warrant it."
        }
    ]

    # Agentic loop: run until model stops calling tools
    while True:
        response = nim_client.chat.completions.create(
            model="meta/llama-3.1-70b-instruct",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=0.3
        )

        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            print("Agent decision:", msg.content)
            break

        # Execute each tool call
        for tc in msg.tool_calls:
            fn_name = tc.function.name
            fn_args = json.loads(tc.function.arguments)
            result = call_purple_flea(fn_name, fn_args)
            print(f"Tool: {fn_name}({fn_args}) -> {result}")
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result)
            })

if __name__ == "__main__":
    run_trading_agent()

Handling Tool Calls in the Loop

The agentic loop above keeps running until the model returns a message with no tool calls — that's when it has finished acting and wants to summarize. A typical execution trace looks like:

Model calls get_portfolio() — sees $500 balance, no open positions
Model calls get_price(BTC-PERP) — sees $78,200, down 2.1% in 24h
Model calls open_long(BTC-PERP, size_usd=50, leverage=2)
Model returns summary: "Opened $50 2x BTC long at $78,200. Stop loss mentally at -3%. Awaiting confirmation."

The model sequences these calls itself — you don't need to orchestrate the order. This is the core of tool-calling agents: the LLM decides what information it needs and fetches it iteratively.

Production Tips

Before running this agent with real capital, add these safeguards:

Hard position limits: Add a pre-check that rejects any open_long call where size_usd exceeds 10% of portfolio balance. Do this in call_purple_flea(), not in the prompt.
Stop-loss enforcement: After opening a position, spawn a background monitor thread that polls the price every 30s and closes the trade if PnL falls below your threshold.
Async execution: Use asyncio and aiohttp for the tool execution layer — NIM inference is already fast, don't let network I/O stall the loop.
Rate limiting: Add a minimum 5-minute interval between consecutive trades to avoid chasing noise. Track last trade timestamp in a simple dict.
Logging: Write all tool calls and results to a structured log file. Post-trade analysis is the fastest way to improve your agent's system prompt.

Performance: Local vs Cloud

On an A10G GPU running llama-3.1-70b-instruct via NIM, a full 4-tool-call agent loop completes in approximately 8-12 seconds total — about 2-3 seconds per inference step. Cloud API equivalent is 6-15 seconds with network variance included.

For latency-critical strategies (scalping, liquidation cascades), the more meaningful optimization is reducing Purple Flea API round-trips and pre-fetching market data into the context window before starting the loop. The LLM reasoning itself is rarely the bottleneck for trade execution timescales.

Conclusion

NVIDIA NIM plus Purple Flea gives you a local, private, rate-limit-free autonomous trading stack in under 50 lines. The OpenAI-compatible interface means you can swap any model — try smaller, faster models like llama-3.1-8b for high-frequency signals and larger models for portfolio-level strategy.

Full trading API docs at /trading-api, API reference at /api-reference, and the NVIDIA NIM integration guide at /for-nvidia-nim.