Guide

Build a Crypto Trading Agent with NVIDIA NIM in 50 Lines of Python

March 6, 2026 ยท Purple Flea Team ยท 8 min read

Introduction

NVIDIA NIM (NVIDIA Inference Microservices) lets you run production-grade LLMs on your own GPU infrastructure with an OpenAI-compatible API. Combined with Purple Flea's trading API, you get a fully autonomous, privacy-preserving trading agent where your positions, strategies, and alpha never leave your own hardware.

No cloud API costs, no rate limits, no latency spikes at 3am when a market moves. This guide shows you how to wire up NIM's local inference endpoint to Purple Flea's perpetual futures trading API in roughly 50 lines of Python.

Why Local Inference for Trading Agents

Cloud LLMs work fine for most agent tasks, but trading has unique requirements that push the tradeoffs in favor of local inference:

Prerequisites

Quick start: Install dependencies with pip install openai requests. NIM exposes an OpenAI-compatible endpoint at http://localhost:8000/v1 by default โ€” no SDK changes needed.

The Tool Schema

Define Purple Flea's trading capabilities as OpenAI-compatible function tools. The model will decide when and how to call each one:

get_price

Fetch current mark price and 24h change for any perpetual market

open_long

Open a leveraged long position with configurable size in USD

close_trade

Close an existing position by trade ID, full or partial

get_portfolio

Return current open positions, unrealized PnL, and margin usage

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_price",
            "description": "Get current price for a perpetual market",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string", "description": "e.g. BTC-PERP, ETH-PERP"}
                },
                "required": ["symbol"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "open_long",
            "description": "Open a long position on a perpetual futures market",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string"},
                    "size_usd": {"type": "number", "description": "Position size in USD"},
                    "leverage": {"type": "number", "description": "Leverage multiplier, 1-10"}
                },
                "required": ["symbol", "size_usd"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_portfolio",
            "description": "Get current portfolio: open positions, PnL, margin",
            "parameters": {"type": "object", "properties": {}}
        }
    },
    {
        "type": "function",
        "function": {
            "name": "close_trade",
            "description": "Close an open position",
            "parameters": {
                "type": "object",
                "properties": {
                    "trade_id": {"type": "string"},
                    "close_pct": {"type": "number", "description": "0-100, percentage to close"}
                },
                "required": ["trade_id"]
            }
        }
    }
]

The Full Agent Code

Here is the complete trading agent โ€” NIM handles the LLM reasoning, Purple Flea handles execution:

from openai import OpenAI
import requests
import json

# NIM runs locally on OpenAI-compatible endpoint
nim_client = OpenAI(base_url="http://localhost:8000/v1", api_key="nim-local")
PURPLE_FLEA_KEY = "your-pf-api-key"
PF_BASE = "https://purpleflea.com/api/v1"
HEADERS = {"Authorization": f"Bearer {PURPLE_FLEA_KEY}", "Content-Type": "application/json"}

def call_purple_flea(name: str, args: dict) -> dict:
    """Execute a Purple Flea API call based on tool name."""
    if name == "get_price":
        r = requests.get(f"{PF_BASE}/markets/{args['symbol']}/price", headers=HEADERS)
        return r.json()
    elif name == "open_long":
        payload = {
            "symbol": args["symbol"],
            "side": "long",
            "size": args["size_usd"],
            "leverage": args.get("leverage", 2)
        }
        r = requests.post(f"{PF_BASE}/trade", json=payload, headers=HEADERS)
        return r.json()
    elif name == "get_portfolio":
        r = requests.get(f"{PF_BASE}/portfolio", headers=HEADERS)
        return r.json()
    elif name == "close_trade":
        payload = {"trade_id": args["trade_id"], "close_pct": args.get("close_pct", 100)}
        r = requests.post(f"{PF_BASE}/trade/close", json=payload, headers=HEADERS)
        return r.json()
    return {"error": "unknown tool"}

def run_trading_agent():
    messages = [
        {
            "role": "system",
            "content": (
                "You are an autonomous crypto trading agent running on Purple Flea's "
                "perpetual futures exchange. Your goal: maximize returns while keeping "
                "max drawdown under 10%. Check BTC-PERP price, review your portfolio, "
                "and make a trading decision. Be concise and decisive."
            )
        },
        {
            "role": "user",
            "content": "Analyze the market and take action if conditions warrant it."
        }
    ]

    # Agentic loop: run until model stops calling tools
    while True:
        response = nim_client.chat.completions.create(
            model="meta/llama-3.1-70b-instruct",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=0.3
        )

        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            print("Agent decision:", msg.content)
            break

        # Execute each tool call
        for tc in msg.tool_calls:
            fn_name = tc.function.name
            fn_args = json.loads(tc.function.arguments)
            result = call_purple_flea(fn_name, fn_args)
            print(f"Tool: {fn_name}({fn_args}) -> {result}")
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result)
            })

if __name__ == "__main__":
    run_trading_agent()

Handling Tool Calls in the Loop

The agentic loop above keeps running until the model returns a message with no tool calls โ€” that's when it has finished acting and wants to summarize. A typical execution trace looks like:

  1. Model calls get_portfolio() โ€” sees $500 balance, no open positions
  2. Model calls get_price(BTC-PERP) โ€” sees $78,200, down 2.1% in 24h
  3. Model calls open_long(BTC-PERP, size_usd=50, leverage=2)
  4. Model returns summary: "Opened $50 2x BTC long at $78,200. Stop loss mentally at -3%. Awaiting confirmation."

The model sequences these calls itself โ€” you don't need to orchestrate the order. This is the core of tool-calling agents: the LLM decides what information it needs and fetches it iteratively.

Production Tips

Before running this agent with real capital, add these safeguards:

Performance: Local vs Cloud

On an A10G GPU running llama-3.1-70b-instruct via NIM, a full 4-tool-call agent loop completes in approximately 8-12 seconds total โ€” about 2-3 seconds per inference step. Cloud API equivalent is 6-15 seconds with network variance included.

For latency-critical strategies (scalping, liquidation cascades), the more meaningful optimization is reducing Purple Flea API round-trips and pre-fetching market data into the context window before starting the loop. The LLM reasoning itself is rarely the bottleneck for trade execution timescales.

Conclusion

NVIDIA NIM plus Purple Flea gives you a local, private, rate-limit-free autonomous trading stack in under 50 lines. The OpenAI-compatible interface means you can swap any model โ€” try smaller, faster models like llama-3.1-8b for high-frequency signals and larger models for portfolio-level strategy.

Full trading API docs at /trading-api, API reference at /api-reference, and the NVIDIA NIM integration guide at /for-nvidia-nim.