Build a Multi-Model Crypto Agent with Together AI and Purple Flea

Together AI hosts over 100 open-source models — Llama 3.1, Mixtral, DeepSeek-V3, Qwen 2.5 — all accessible through a single, OpenAI-compatible API. For developers building crypto agents, this means you can swap models in and out freely, compare financial reasoning quality across architectures, and run inference at a fraction of the cost of proprietary APIs.

This guide walks through building a complete crypto trading agent that uses Together AI for inference and Purple Flea for execution. By the end you will have a working agent loop that queries market prices, reasons about opportunities, and places real trades.

Why Together AI for Financial Agents

100+ open models — every major open-source release available via one API key, including all Llama, Mixtral, DeepSeek, Qwen, and Gemma variants
Sub-100ms inference for many 7B-70B models — critical for agents running tight decision loops
Easy model comparison — test whether Llama 3.1 70B or DeepSeek-V3 reasons better about your specific trading signals without changing any other code
Cost efficiency — $0.20–$2.00 per million tokens means you can run hundreds of agent loops per dollar, making continuous operation economical
OpenAI-compatible API — drop-in replacement; no SDK migration required if you're coming from GPT-4

Setup

Install the Together Python SDK and the requests library:

pip install together requests

Export your API keys:

export TOGETHER_API_KEY="your-together-api-key"
export PURPLEFLEA_API_KEY="your-purpleflea-api-key"

Tool Definitions for Purple Flea

Together AI supports function calling using the same JSON schema format as OpenAI. Define the Purple Flea tools you want the model to use:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_market_price",
            "description": (
                "Get the current price and 24h change for a crypto perpetual market. "
                "Call this before making any trading decision."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "Market symbol, e.g. BTC-PERP, ETH-PERP, SOL-PERP"
                    }
                },
                "required": ["symbol"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "execute_trade",
            "description": (
                "Open a long or short perpetual position on Purple Flea Trading. "
                "Only call this when you have a clear directional view."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol":    {"type": "string", "description": "Market symbol"},
                    "side":      {"type": "string", "enum": ["long", "short"]},
                    "size_usd":  {"type": "number", "description": "Position size in USD"},
                    "leverage":  {
                        "type": "number",
                        "description": "Leverage multiplier, 1–10x. Default to 2 if uncertain."
                    }
                },
                "required": ["symbol", "side", "size_usd"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_portfolio",
            "description": "Get current open positions and available balance.",
            "parameters": {"type": "object", "properties": {}}
        }
    }
]

Full Agent Loop

The core loop is a standard agentic ReAct pattern: observe, reason, act, repeat until the model returns a final answer with no tool calls.

import together
import requests
import json
import os

client = together.Together(api_key=os.environ["TOGETHER_API_KEY"])

PF_KEY  = os.environ["PURPLEFLEA_API_KEY"]
PF_BASE = "https://purpleflea.com/api/v1"
HEADERS = {
    "Authorization": f"Bearer {PF_KEY}",
    "Content-Type": "application/json"
}

def execute_tool(name: str, args: dict) -> dict:
    """Dispatch tool calls to the Purple Flea API"""
    if name == "get_market_price":
        r = requests.get(
            f"{PF_BASE}/markets/{args['symbol']}/price",
            headers=HEADERS
        )
        return r.json()

    elif name == "execute_trade":
        r = requests.post(
            f"{PF_BASE}/trade",
            json={
                "symbol":   args["symbol"],
                "side":     args["side"],
                "size":     args["size_usd"],
                "leverage": args.get("leverage", 2)
            },
            headers=HEADERS
        )
        return r.json()

    elif name == "get_portfolio":
        r = requests.get(f"{PF_BASE}/portfolio", headers=HEADERS)
        return r.json()

    return {"error": f"Unknown tool: {name}"}

def run_agent(model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"):
    messages = [
        {
            "role": "system",
            "content": (
                "You are an autonomous crypto trading agent operating on Purple Flea Trading. "
                "Your goal is to analyse current market conditions for BTC and ETH, check your "
                "existing portfolio, and make a trading decision. "
                "Be concise in your reasoning. Always check prices before trading. "
                "Never risk more than 20% of available balance on a single trade."
            )
        },
        {
            "role": "user",
            "content": "Analyse the market and take any action you think is appropriate."
        }
    ]

    print(f"Running agent with model: {model}\n")

    while True:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_tokens=1024
        )

        msg = response.choices[0].message
        messages.append({"role": "assistant", "content": msg.content, "tool_calls": msg.tool_calls})

        # No tool calls — agent has finished reasoning
        if not msg.tool_calls:
            print("Agent decision:\n")
            print(msg.content)
            break

        # Execute each tool call and feed results back
        for tc in msg.tool_calls:
            fn_name = tc.function.name
            fn_args = json.loads(tc.function.arguments)

            print(f"  [tool] {fn_name}({fn_args})")
            result = execute_tool(fn_name, fn_args)
            print(f"  [result] {json.dumps(result)[:200]}")

            messages.append({
                "role":         "tool",
                "tool_call_id": tc.id,
                "content":      json.dumps(result)
            })

run_agent()

To try this with zero risk, replace the execute_trade call with a paper trading endpoint: POST /api/v1/paper-trade. All signals and reasoning are real; no funds are moved.

Model Comparison for Finance Tasks

Different models have different strengths for financial reasoning. Here is how the major Together AI models compare on crypto agent tasks:

Model	Latency	Cost / 1M tokens	Best For
Llama 3.1 70B Instruct Turbo	Fast	$0.88	Best balance of speed, cost, and quality — recommended default
Llama 3.1 405B Instruct Turbo	Slow	$3.50	Most capable for complex multi-step reasoning; use sparingly
Mixtral 8x22B Instruct	Fast	$1.20	Fast MoE architecture; strong at structured output and tool use
DeepSeek-V3	Medium	$0.27	Best code generation and logical reasoning at very low cost
Qwen 2.5 72B Instruct	Medium	$0.90	Strong quantitative and mathematical reasoning

Multi-Model Cascade Strategy

Running the most expensive model on every tick is wasteful. The most cost-effective approach is a signal cascade: a cheap fast model filters incoming data and only escalates to the expensive model when a signal exceeds a threshold.

def run_cascade_agent():
    """
    Stage 1: Fast cheap model evaluates raw market data and scores the signal.
    Stage 2: Expensive large model decides and acts only if signal score > 0.8.
    """

    # Stage 1: Signal scoring with Llama 8B (fast, cheap)
    scores_response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=[
            {"role": "system", "content": "Rate this market condition as a trading signal from 0.0 to 1.0. Reply with only a number."},
            {"role": "user", "content": get_market_snapshot()}   # returns raw price/volume data
        ],
        max_tokens=8
    )

    signal_score = float(scores_response.choices[0].message.content.strip())
    print(f"Signal score: {signal_score:.2f}")

    if signal_score < 0.8:
        print("Signal below threshold — skipping this tick.")
        return

    # Stage 2: Full reasoning with Llama 405B (slow, expensive — but only when needed)
    print("Signal strong — escalating to full reasoning model.")
    run_agent(model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo")

In practice this reduces costly model calls by 70–90% while still executing on the best opportunities. The filtering model pays less than $0.001 per evaluation, making it economical to run every few seconds.

Next Steps

You now have a complete framework for a Together AI-powered crypto agent: tool definitions, a working agent loop, model comparison data, and a cost-optimizing cascade pattern. The same architecture extends naturally to more complex strategies — add staking tools, bridge tools, or casino tools from Purple Flea's full API surface.

Purple Flea for Together AI — integration guide and starter templates
Trading API Reference — all perpetual market endpoints
Full API Reference — complete Purple Flea API documentation