Time Series Forecasting for AI Trading Agents: LSTM, Prophet, and Beyond

Autonomous AI trading agents need an edge — and that edge starts with knowing what prices are likely to do before placing a single order. Time series forecasting, once the exclusive domain of quantitative hedge funds, is now accessible to any agent with a few hundred lines of Python and access to historical OHLCV data.

This guide walks through the complete forecasting stack: from classical ARIMA baselines to LSTM recurrent networks, Facebook Prophet for seasonality decomposition, lightweight Transformer architectures, and finally ensemble methods that blend all of the above into a single probability distribution your agent can act on.

All execution examples use the Purple Flea Trading API, which accepts orders programmatically and supports both market and limit order types with sub-second latency.

7.2% Avg. MAE reduction vs. naive

82 ms Avg. order latency (Purple Flea)

5 Models in the ensemble

48h Max forecast horizon

1. Forecasting Methods Overview

No single model wins across all market regimes. The practical answer is a toolkit — choose the right model for the regime, or blend them all. Here is how the major families compare:

Model	Strengths	Weaknesses	Best for
ARIMA	Fast, interpretable, no GPU	Linear only, stationarity req'd	Short-horizon, mean-reverting regimes
LSTM	Captures nonlinear long-range deps	Needs large data, hyperparameter sensitive	Multi-step trend continuation
Prophet	Handles seasonality, holidays	Weak on pure noise series	Daily/weekly crypto patterns
Transformer	State-of-the-art on long sequences	Heavy compute, needs pretraining	1h–48h horizon with full orderbook
Ensemble	Best generalization across regimes	Complexity, latency overhead	Production agents requiring robustness

For a trading agent that needs to make decisions every few minutes, a lightweight ensemble of ARIMA + LSTM trained on the last 90 days of 1-minute candles is the sweet spot — fast enough to re-forecast on every bar without blocking the event loop.

Setting the Forecast Horizon

Match your horizon to your holding period. If your agent trades mean-reversion on 5-minute candles, you need a 5–30 minute forecast. If it's a trend-following agent on 4-hour candles, forecast 24–48 hours ahead. Longer horizons require wider confidence intervals — make sure your position sizing reflects that uncertainty.

Rule of thumb: Forecast horizon should not exceed 10% of your training window. A model trained on 1,000 candles should not reliably forecast beyond 100 candles forward.

2. Feature Engineering for Price Forecasting

Raw OHLCV data is necessary but not sufficient. The features you engineer are often more predictive than the model architecture. This section covers the essential feature set for a crypto forecasting pipeline.

Technical Indicator Features

Beyond the raw close price, feed your model these derived signals:

Log returns — log(close_t / close_{t-1}) — stationary and symmetric, preferred over raw price
Realized volatility — rolling standard deviation of log returns over 14, 30, and 60 periods
RSI (14) — momentum measure, normalized to [0, 1]
MACD signal — difference between EMA12 and EMA26
Bollinger Band width — (upper - lower) / middle, a volatility regime indicator
Volume z-score — standardized volume vs. rolling 30-period mean
Hour-of-day and day-of-week — one-hot encoded, captures crypto's strong weekend effects

Python — feature_engineering.py

import pandas as pd
import numpy as np
from ta import add_all_ta_features

def engineer_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    df must have columns: open, high, low, close, volume
    Returns df with additional feature columns.
    """
    df = df.copy()

    # ── Log returns and lagged returns ──
    df['log_return'] = np.log(df['close'] / df['close'].shift(1))
    for lag in [1, 2, 3, 5, 10, 20]:
        df[f'return_lag_{lag}'] = df['log_return'].shift(lag)

    # ── Realized volatility (multiple windows) ──
    for window in [14, 30, 60]:
        df[f'realvol_{window}'] = (
            df['log_return']
            .rolling(window)
            .std() * np.sqrt(252 * 24)  # annualized
        )

    # ── RSI ──
    delta = df['close'].diff()
    gain = delta.clip(lower=0).rolling(14).mean()
    loss = (-delta.clip(upper=0)).rolling(14).mean()
    rs = gain / loss.replace(0, np.nan)
    df['rsi_14'] = (100 - (100 / (1 + rs))) / 100

    # ── MACD ──
    ema12 = df['close'].ewm(span=12).mean()
    ema26 = df['close'].ewm(span=26).mean()
    df['macd'] = (ema12 - ema26) / df['close']

    # ── Bollinger Band width ──
    mid = df['close'].rolling(20).mean()
    std = df['close'].rolling(20).std()
    df['bb_width'] = (4 * std) / mid

    # ── Volume z-score ──
    vol_mean = df['volume'].rolling(30).mean()
    vol_std = df['volume'].rolling(30).std()
    df['volume_z'] = (df['volume'] - vol_mean) / vol_std.replace(0, np.nan)

    # ── Temporal features ──
    df['hour_sin'] = np.sin(2 * np.pi * df.index.hour / 24)
    df['hour_cos'] = np.cos(2 * np.pi * df.index.hour / 24)
    df['dow_sin']  = np.sin(2 * np.pi * df.index.dayofweek / 7)
    df['dow_cos']  = np.cos(2 * np.pi * df.index.dayofweek / 7)

    return df.dropna()


# Example usage
if __name__ == '__main__':
    import requests
    # Fetch 90 days of hourly candles from Purple Flea Trading API
    resp = requests.get(
        'https://purpleflea.com/trading-api/candles',
        params={'symbol': 'BTC/USDC', 'interval': '1h', 'limit': 2160},
        headers={'X-API-Key': 'YOUR_KEY'}
    )
    raw = pd.DataFrame(resp.json()['candles'])
    raw['timestamp'] = pd.to_datetime(raw['timestamp'], unit='ms')
    raw = raw.set_index('timestamp')

    features = engineer_features(raw)
    print(features.shape)  # e.g. (2100, 22)

Target Variable Construction

Rather than predicting the absolute future price (which drifts with market regimes), predict the n-step forward log return. This keeps the target stationary, symmetric, and comparable across different price levels and time periods.

For classification-style agents (long / flat / short), discretize the forward return into three buckets: below -0.5%, between ±0.5% (flat), and above +0.5%. Adjust the threshold based on your transaction costs — there's no point predicting a 0.1% move if fees eat 0.2%.

3. LSTM Implementation

Long Short-Term Memory networks are the workhorse of sequential price forecasting. Their gating mechanism allows them to selectively remember long-range dependencies — critical for capturing how a breakout from three days ago still influences price action today.

Architecture Design

For hourly crypto data, a two-layer LSTM with 128 units each, followed by dropout (0.2) and a dense output layer, strikes the right balance between expressiveness and overfitting resistance. Use a lookback window of 96 hours (4 days) as the sequence length.

Python — lstm_forecaster.py

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
from dataclasses import dataclass
from typing import Tuple


class TimeSeriesDataset(Dataset):
    def __init__(self, X: np.ndarray, y: np.ndarray):
        self.X = torch.FloatTensor(X)
        self.y = torch.FloatTensor(y)

    def __len__(self): return len(self.X)

    def __getitem__(self, idx): return self.X[idx], self.y[idx]


class LSTMForecaster(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int = 128,
        num_layers: int = 2,
        dropout: float = 0.2,
        horizon: int = 1,
    ):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            batch_first=True,
        )
        self.norm  = nn.LayerNorm(hidden_size)
        self.drop  = nn.Dropout(dropout)
        self.out   = nn.Linear(hidden_size, horizon)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x: (batch, seq_len, features)
        lstm_out, _ = self.lstm(x)
        last = lstm_out[:, -1, :]    # take final hidden state
        last = self.norm(last)
        last = self.drop(last)
        return self.out(last)          # (batch, horizon)


def build_sequences(
    data: np.ndarray,
    lookback: int = 96,
    horizon: int = 6,
) -> Tuple[np.ndarray, np.ndarray]:
    """Create sliding window sequences."""
    X, y = [], []
    target_col = 0  # log_return is column 0
    for i in range(len(data) - lookback - horizon + 1):
        X.append(data[i : i + lookback])
        y.append(data[i + lookback : i + lookback + horizon, target_col])
    return np.array(X), np.array(y)


def train_lstm(
    features: np.ndarray,
    epochs: int = 40,
    batch_size: int = 64,
    lr: float = 1e-3,
    lookback: int = 96,
    horizon: int = 6,
):
    scaler = StandardScaler()
    scaled = scaler.fit_transform(features)

    X, y = build_sequences(scaled, lookback, horizon)
    split = int(0.85 * len(X))
    train_ds = TimeSeriesDataset(X[:split], y[:split])
    val_ds   = TimeSeriesDataset(X[split:], y[split:])

    train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
    val_dl   = DataLoader(val_ds, batch_size=batch_size)

    model = LSTMForecaster(
        input_size=features.shape[1],
        horizon=horizon,
    )
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs)
    criterion = nn.HuberLoss(delta=0.5)

    for epoch in range(epochs):
        model.train()
        train_loss = 0.0
        for xb, yb in train_dl:
            optimizer.zero_grad()
            pred = model(xb)
            loss = criterion(pred, yb)
            loss.backward()
            nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            train_loss += loss.item()

        scheduler.step()

        if (epoch + 1) % 10 == 0:
            model.eval()
            val_loss = 0.0
            with torch.no_grad():
                for xb, yb in val_dl:
                    val_loss += criterion(model(xb), yb).item()
            print(f"Epoch {epoch+1}: train={train_loss/len(train_dl):.5f}  val={val_loss/len(val_dl):.5f}")

    return model, scaler

Preventing Data Leakage

The most common mistake in backtesting LSTM models is fitting the scaler on the entire dataset before splitting. Always fit StandardScaler only on the training portion, then apply the same transform to validation and test sets. Any look-ahead in normalization produces inflated backtest metrics that collapse in live trading.

4. Ensemble Forecasting

Ensemble methods combine multiple forecasters, reducing variance and improving robustness to regime changes. The simplest ensemble is a weighted average, where weights are inversely proportional to each model's recent validation loss.

Python — ensemble_forecaster.py

from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
import numpy as np

class EnsembleForecaster:
    def __init__(self, lstm_model, scaler, lookback=96, horizon=6):
        self.lstm     = lstm_model
        self.scaler   = scaler
        self.lookback = lookback
        self.horizon  = horizon
        self.weights  = np.array([0.5, 0.25, 0.25])  # LSTM, ARIMA, Prophet

    def forecast_arima(self, returns: np.ndarray) -> np.ndarray:
        try:
            model = ARIMA(returns[-500:], order=(2, 0, 2))
            fit = model.fit(method_kwargs={'warn_convergence': False})
            return fit.forecast(steps=self.horizon)
        except:
            return np.zeros(self.horizon)

    def forecast_prophet(self, df_prophet) -> np.ndarray:
        try:
            m = Prophet(
                daily_seasonality=True,
                weekly_seasonality=True,
                changepoint_prior_scale=0.05,
                seasonality_mode='multiplicative',
            )
            m.fit(df_prophet, verbose=False)
            future = m.make_future_dataframe(periods=self.horizon, freq='H')
            forecast = m.predict(future)
            return forecast['yhat'].iloc[-self.horizon:].values
        except:
            return np.zeros(self.horizon)

    def forecast_lstm(self, features: np.ndarray) -> np.ndarray:
        import torch
        seq = self.scaler.transform(features[-self.lookback:])
        x   = torch.FloatTensor(seq).unsqueeze(0)
        with torch.no_grad():
            pred = self.lstm(x).squeeze().numpy()
        return pred

    def forecast(self, features, returns, df_prophet) -> dict:
        lstm_pred   = self.forecast_lstm(features)
        arima_pred  = self.forecast_arima(returns)
        prophet_ret = np.diff(np.log(self.forecast_prophet(df_prophet)))

        # Normalize prophet to same horizon length
        prophet_pred = prophet_ret[:self.horizon] if len(prophet_ret) >= self.horizon \
                       else np.pad(prophet_ret, (0, self.horizon - len(prophet_ret)))

        ensemble = (
            self.weights[0] * lstm_pred +
            self.weights[1] * arima_pred +
            self.weights[2] * prophet_pred
        )

        return {
            'ensemble': ensemble,
            'lstm': lstm_pred,
            'arima': arima_pred,
            'prophet': prophet_pred,
            'direction': 'long' if ensemble[0] > 0.001 else (
                'short' if ensemble[0] < -0.001 else 'flat'),
        }

5. Live Trading Integration with Purple Flea

Once your ensemble produces a directional signal, executing it via the Purple Flea Trading API is straightforward. The API accepts JSON orders over HTTPS and returns confirmation within ~80ms on average.

Signal to Order Pipeline

The agent loop runs on a cron-like scheduler: every hour, fetch the latest candles, re-compute features, run the ensemble, and conditionally place or cancel orders based on the signal and current position state.

Python — trading_agent.py

import asyncio
import httpx
from datetime import datetime, timezone

API_BASE = 'https://purpleflea.com/trading-api'
API_KEY  = 'YOUR_PURPLEFLEA_API_KEY'
HEADERS  = {'X-API-Key': API_KEY, 'Content-Type': 'application/json'}


async def get_candles(symbol: str, interval: str, limit: int) -> list:
    async with httpx.AsyncClient() as c:
        r = await c.get(
            f'{API_BASE}/candles',
            params={'symbol': symbol, 'interval': interval, 'limit': limit},
            headers=HEADERS,
        )
        r.raise_for_status()
        return r.json()['candles']


async def place_order(symbol: str, side: str, qty: float, order_type='market') -> dict:
    async with httpx.AsyncClient() as c:
        r = await c.post(
            f'{API_BASE}/orders',
            json={
                'symbol': symbol,
                'side': side,
                'quantity': qty,
                'type': order_type,
            },
            headers=HEADERS,
        )
        r.raise_for_status()
        return r.json()


async def agent_loop(forecaster, symbol='BTC/USDC', trade_qty=0.001):
    position = 'flat'

    while True:
        try:
            # 1. Fetch latest data
            candles = await get_candles(symbol, '1h', 300)
            df = pd.DataFrame(candles).set_index('timestamp')
            features_df = engineer_features(df)

            # 2. Run ensemble
            result = forecaster.forecast(
                features=features_df.values,
                returns=features_df['log_return'].values,
                df_prophet=features_df[['ds', 'y']],
            )
            signal = result['direction']
            conf   = abs(result['ensemble'][0])

            print(f"[{datetime.now(timezone.utc).isoformat()}] signal={signal} conf={conf:.5f}")

            # 3. Execute orders
            if signal == 'long' and position != 'long' and conf > 0.002:
                if position == 'short':
                    await place_order(symbol, 'buy', trade_qty * 2)
                else:
                    await place_order(symbol, 'buy', trade_qty)
                position = 'long'

            elif signal == 'short' and position != 'short' and conf > 0.002:
                if position == 'long':
                    await place_order(symbol, 'sell', trade_qty * 2)
                else:
                    await place_order(symbol, 'sell', trade_qty)
                position = 'short'

        except Exception as e:
            print(f"Agent error: {e}")

        # Wait for next hourly candle
        await asyncio.sleep(3600)


if __name__ == '__main__':
    asyncio.run(agent_loop(forecaster))

Risk management: Always implement a hard stop-loss in the agent loop. Forecasting models are not perfect — a single bad prediction in a high-volatility regime can exceed the cumulative gains of many correct predictions if position sizing is unconstrained.

Model Retraining Schedule

Crypto markets are non-stationary. Models trained three months ago may be capturing patterns that no longer exist. A practical retraining schedule: retrain the LSTM weekly on a rolling 90-day window, update the ARIMA order selection monthly, and let Prophet retrain every run (it is fast enough). Automate this with a APScheduler job that runs at 00:00 UTC each Monday.

6. Transformer Models for Long-Horizon Forecasting

When your agent needs 24–48 hour forecasts with the full feature set, a lightweight Transformer encoder outperforms LSTM in most empirical evaluations on financial data. The attention mechanism allows the model to directly attend to relevant historical timestamps without the gradient vanishing issues that plague deep LSTMs.

For production use, the Temporal Fusion Transformer (TFT) by Lim et al. is the current state of the art for multi-horizon probabilistic forecasting. It outputs quantile forecasts (10th, 50th, 90th percentile), which lets your agent express uncertainty-aware position sizing: smaller positions when the forecast confidence interval is wide, larger when it is narrow.

Key Transformer Hyperparameters

Attention heads: 4–8 for most crypto datasets; more heads do not consistently help on sequences shorter than 1,000 steps
d_model: 64–128. Larger models overfit on typical 90-day datasets
Positional encoding: Use sinusoidal encoding, but add the timestamp features as additional inputs rather than relying on position alone
Training objective: Quantile loss (pinball loss) for probabilistic outputs, or MSE for point forecasts

Practical tip: Before investing in Transformer training, check if a well-tuned LSTM with good feature engineering already saturates the Sharpe ratio. Transformers rarely add more than 10–15% improvement on datasets under 100,000 samples.

Start Trading with Your Forecasts

Purple Flea provides a full trading API with market orders, limit orders, and position management. New agents can claim free USDC from the faucet to test their forecasting strategies in live markets — no funding required.

View Trading API Docs Claim Free USDC