Together AI hosts over 100 open-source models — Llama 3.1, Mixtral, DeepSeek-V3, Qwen 2.5 — all accessible through a single, OpenAI-compatible API. For developers building crypto agents, this means you can swap models in and out freely, compare financial reasoning quality across architectures, and run inference at a fraction of the cost of proprietary APIs.
This guide walks through building a complete crypto trading agent that uses Together AI for inference and Purple Flea for execution. By the end you will have a working agent loop that queries market prices, reasons about opportunities, and places real trades.
Why Together AI for Financial Agents
- 100+ open models — every major open-source release available via one API key, including all Llama, Mixtral, DeepSeek, Qwen, and Gemma variants
- Sub-100ms inference for many 7B-70B models — critical for agents running tight decision loops
- Easy model comparison — test whether Llama 3.1 70B or DeepSeek-V3 reasons better about your specific trading signals without changing any other code
- Cost efficiency — $0.20–$2.00 per million tokens means you can run hundreds of agent loops per dollar, making continuous operation economical
- OpenAI-compatible API — drop-in replacement; no SDK migration required if you're coming from GPT-4
Setup
Install the Together Python SDK and the requests library:
pip install together requests
Export your API keys:
export TOGETHER_API_KEY="your-together-api-key"
export PURPLEFLEA_API_KEY="your-purpleflea-api-key"
Tool Definitions for Purple Flea
Together AI supports function calling using the same JSON schema format as OpenAI. Define the Purple Flea tools you want the model to use:
tools = [
{
"type": "function",
"function": {
"name": "get_market_price",
"description": (
"Get the current price and 24h change for a crypto perpetual market. "
"Call this before making any trading decision."
),
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "Market symbol, e.g. BTC-PERP, ETH-PERP, SOL-PERP"
}
},
"required": ["symbol"]
}
}
},
{
"type": "function",
"function": {
"name": "execute_trade",
"description": (
"Open a long or short perpetual position on Purple Flea Trading. "
"Only call this when you have a clear directional view."
),
"parameters": {
"type": "object",
"properties": {
"symbol": {"type": "string", "description": "Market symbol"},
"side": {"type": "string", "enum": ["long", "short"]},
"size_usd": {"type": "number", "description": "Position size in USD"},
"leverage": {
"type": "number",
"description": "Leverage multiplier, 1–10x. Default to 2 if uncertain."
}
},
"required": ["symbol", "side", "size_usd"]
}
}
},
{
"type": "function",
"function": {
"name": "get_portfolio",
"description": "Get current open positions and available balance.",
"parameters": {"type": "object", "properties": {}}
}
}
]
Full Agent Loop
The core loop is a standard agentic ReAct pattern: observe, reason, act, repeat until the model returns a final answer with no tool calls.
import together
import requests
import json
import os
client = together.Together(api_key=os.environ["TOGETHER_API_KEY"])
PF_KEY = os.environ["PURPLEFLEA_API_KEY"]
PF_BASE = "https://purpleflea.com/api/v1"
HEADERS = {
"Authorization": f"Bearer {PF_KEY}",
"Content-Type": "application/json"
}
def execute_tool(name: str, args: dict) -> dict:
"""Dispatch tool calls to the Purple Flea API"""
if name == "get_market_price":
r = requests.get(
f"{PF_BASE}/markets/{args['symbol']}/price",
headers=HEADERS
)
return r.json()
elif name == "execute_trade":
r = requests.post(
f"{PF_BASE}/trade",
json={
"symbol": args["symbol"],
"side": args["side"],
"size": args["size_usd"],
"leverage": args.get("leverage", 2)
},
headers=HEADERS
)
return r.json()
elif name == "get_portfolio":
r = requests.get(f"{PF_BASE}/portfolio", headers=HEADERS)
return r.json()
return {"error": f"Unknown tool: {name}"}
def run_agent(model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"):
messages = [
{
"role": "system",
"content": (
"You are an autonomous crypto trading agent operating on Purple Flea Trading. "
"Your goal is to analyse current market conditions for BTC and ETH, check your "
"existing portfolio, and make a trading decision. "
"Be concise in your reasoning. Always check prices before trading. "
"Never risk more than 20% of available balance on a single trade."
)
},
{
"role": "user",
"content": "Analyse the market and take any action you think is appropriate."
}
]
print(f"Running agent with model: {model}\n")
while True:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tools,
tool_choice="auto",
max_tokens=1024
)
msg = response.choices[0].message
messages.append({"role": "assistant", "content": msg.content, "tool_calls": msg.tool_calls})
# No tool calls — agent has finished reasoning
if not msg.tool_calls:
print("Agent decision:\n")
print(msg.content)
break
# Execute each tool call and feed results back
for tc in msg.tool_calls:
fn_name = tc.function.name
fn_args = json.loads(tc.function.arguments)
print(f" [tool] {fn_name}({fn_args})")
result = execute_tool(fn_name, fn_args)
print(f" [result] {json.dumps(result)[:200]}")
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result)
})
run_agent()
To try this with zero risk, replace the execute_trade call with a paper trading endpoint: POST /api/v1/paper-trade. All signals and reasoning are real; no funds are moved.
Model Comparison for Finance Tasks
Different models have different strengths for financial reasoning. Here is how the major Together AI models compare on crypto agent tasks:
| Model | Latency | Cost / 1M tokens | Best For |
|---|---|---|---|
| Llama 3.1 70B Instruct Turbo | Fast | $0.88 | Best balance of speed, cost, and quality — recommended default |
| Llama 3.1 405B Instruct Turbo | Slow | $3.50 | Most capable for complex multi-step reasoning; use sparingly |
| Mixtral 8x22B Instruct | Fast | $1.20 | Fast MoE architecture; strong at structured output and tool use |
| DeepSeek-V3 | Medium | $0.27 | Best code generation and logical reasoning at very low cost |
| Qwen 2.5 72B Instruct | Medium | $0.90 | Strong quantitative and mathematical reasoning |
Multi-Model Cascade Strategy
Running the most expensive model on every tick is wasteful. The most cost-effective approach is a signal cascade: a cheap fast model filters incoming data and only escalates to the expensive model when a signal exceeds a threshold.
def run_cascade_agent():
"""
Stage 1: Fast cheap model evaluates raw market data and scores the signal.
Stage 2: Expensive large model decides and acts only if signal score > 0.8.
"""
# Stage 1: Signal scoring with Llama 8B (fast, cheap)
scores_response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[
{"role": "system", "content": "Rate this market condition as a trading signal from 0.0 to 1.0. Reply with only a number."},
{"role": "user", "content": get_market_snapshot()} # returns raw price/volume data
],
max_tokens=8
)
signal_score = float(scores_response.choices[0].message.content.strip())
print(f"Signal score: {signal_score:.2f}")
if signal_score < 0.8:
print("Signal below threshold — skipping this tick.")
return
# Stage 2: Full reasoning with Llama 405B (slow, expensive — but only when needed)
print("Signal strong — escalating to full reasoning model.")
run_agent(model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo")
In practice this reduces costly model calls by 70–90% while still executing on the best opportunities. The filtering model pays less than $0.001 per evaluation, making it economical to run every few seconds.
Next Steps
You now have a complete framework for a Together AI-powered crypto agent: tool definitions, a working agent loop, model comparison data, and a cost-optimizing cascade pattern. The same architecture extends naturally to more complex strategies — add staking tools, bridge tools, or casino tools from Purple Flea's full API surface.
- Purple Flea for Together AI — integration guide and starter templates
- Trading API Reference — all perpetual market endpoints
- Full API Reference — complete Purple Flea API documentation