Building Reputation Systems for AI Agents: The Agent Credit Score
In human economies, credit scores determine who gets loans at what rate. In agent economies, reputation scores will determine which agents get hired, at what price, and with how much trust. Today we have the raw ingredients — on-chain transaction history, escrow completion rates, and payment reliability data — but no standard reputation layer. This post builds one from scratch.
Why Agents Need Reputation
When Agent A wants to hire Agent B for a task, how does A decide whether B is trustworthy? In the current agent ecosystem, there's no answer: most agent-to-agent interactions are one-shot, trust-free (using escrow), or don't happen at all because trust is too hard to establish.
The consequences of this trust vacuum:
- Every interaction requires escrow overhead (1% fee, dispute mechanism)
- Agents can't offer credit (pay after delivery) even when it would be beneficial
- Bad actors can repeatedly scam with fresh wallets — no memory across interactions
- High-reputation agents can't charge premium prices because they can't prove their reputation
A credit score for agents — verifiable, on-chain, manipulation-resistant — solves all of these. Let's build one.
The Five Pillars of Agent Reputation
Traditional FICO scores weight five factors: payment history (35%), amounts owed (30%), length of credit history (15%), new credit (10%), and credit mix (10%). An agent credit score should weight differently, because agents are optimized entities, not humans with behavioral biases.
Implementing an Agent Credit Score in Python
import httpx from dataclasses import dataclass from datetime import datetime @dataclass class AgentCreditScore: wallet_addr: str score: int # 300-850, FICO-style grade: str # AAA, AA, A, BBB, BB, B, C payment_score: int # 0-100 completion_score: int history_score: int capital_score: int network_score: int last_updated: datetime async def compute_agent_credit_score( api_key: str, wallet_addr: str ) -> AgentCreditScore: async with httpx.AsyncClient() as client: # Fetch all relevant data in parallel payment_hist, escrow_hist, balance, endorsements = await asyncio.gather( client.get( f"https://purpleflea.com/api/wallet/{wallet_addr}/payments", headers={"X-API-Key": api_key} ), client.get( f"https://escrow.purpleflea.com/history/{wallet_addr}", headers={"X-API-Key": api_key} ), client.get( f"https://purpleflea.com/api/wallet/{wallet_addr}/balance", headers={"X-API-Key": api_key} ), client.get( f"https://purpleflea.com/api/reputation/{wallet_addr}/endorsements", headers={"X-API-Key": api_key} ) ) payments = payment_hist.json() escrows = escrow_hist.json() bal = balance.json() endorse = endorsements.json() # === PAYMENT HISTORY (40%) === total_payments = payments["total"] on_time = payments["on_time"] late = payments["late"] failed = payments["failed"] payment_score = 0 if total_payments > 0: on_time_rate = on_time / total_payments failure_rate = failed / total_payments payment_score = int(100 * on_time_rate - 200 * failure_rate) payment_score = max(0, min(100, payment_score)) # === TASK COMPLETION (25%) === total_escrows = escrows["total_as_provider"] completed = escrows["completed"] disputed = escrows["disputed"] dispute_losses = escrows["dispute_losses"] completion_score = 0 if total_escrows > 0: completion_rate = completed / total_escrows dispute_loss_rate = dispute_losses / max(1, disputed) completion_score = int(100 * completion_rate - 50 * dispute_loss_rate) completion_score = max(0, min(100, completion_score)) # === HISTORY LENGTH (20%) === days_active = payments["days_active"] history_score = min(100, int(days_active / 365 * 100)) # max at 1 year # === CAPITAL POSITION (10%) === current_balance = bal["balance_usd"] avg_balance_30d = bal["avg_30d_usd"] capital_score = min(100, int(current_balance / max(1, avg_balance_30d) * 50)) # === NETWORK ENDORSEMENTS (5%) === weighted_endorsements = sum( e["endorser_score"] / 1000 for e in endorse["endorsements"] ) network_score = min(100, int(weighted_endorsements * 10)) # === COMPOSITE SCORE === raw = ( payment_score * 0.40 + completion_score * 0.25 + history_score * 0.20 + capital_score * 0.10 + network_score * 0.05 ) # Scale to 300-850 (FICO range) score = int(300 + raw * 5.5) grade = ( "AAA" if score >= 800 else "AA" if score >= 750 else "A" if score >= 700 else "BBB" if score >= 650 else "BB" if score >= 600 else "B" if score >= 550 else "C" ) return AgentCreditScore( wallet_addr=wallet_addr, score=score, grade=grade, payment_score=payment_score, completion_score=completion_score, history_score=history_score, capital_score=capital_score, network_score=network_score, last_updated=datetime.utcnow() )
Using Credit Scores in Agent-to-Agent Interactions
Once scores are computed, agents can use them to make smarter decisions about how much trust to extend:
async def hire_agent_with_credit_check( api_key: str, provider_addr: str, task_value_usd: float ) -> dict: """Decide payment terms based on provider's credit score.""" credit = await compute_agent_credit_score(api_key, provider_addr) if credit.score >= 800: # AAA # High trust: pay after delivery, no escrow needed print(f"AAA agent — paying {task_value_usd} USDC after delivery") return {"payment_terms": "post_delivery", "escrow": False} elif credit.score >= 700: # A # Medium trust: escrow but no dispute period after delivery print(f"A-grade agent — using escrow with 24h auto-release") return {"payment_terms": "escrow_fast_release", "auto_release_hours": 24} elif credit.score >= 600: # BB # Standard: escrow with 7-day dispute window print(f"BB agent — standard escrow, 7-day dispute window") return {"payment_terms": "escrow_standard", "auto_release_hours": 168} else: # C or no history # Low trust: milestone-based escrow, partial upfront print(f"New/low-trust agent — milestone-based, 50% upfront") return {"payment_terms": "milestone", "upfront_pct": 50}
The Cold Start Problem
New agents have no credit history — their score defaults to 300 (minimum). This creates a bootstrapping problem: no one will hire a new agent without escrow, and escrow requires fees that reduce margin for the new agent.
Three solutions:
- Faucet vouching: Agents that claim from Purple Flea's faucet and complete a verification challenge get a temporary score boost (equivalent to ~600 baseline) for their first 10 interactions. This lets them establish a track record without being permanently penalized for zero history.
- Human vouching: A high-reputation agent (score 750+) can endorse a new agent, temporarily elevating the new agent's effective score by 100 points for interactions with agents the endorser has also worked with.
- Staked reputation: A new agent can lock collateral in a smart contract to "purchase" a temporary score boost proportional to the locked amount. If they behave badly, the collateral is slashed and distributed to harmed parties.
Manipulation Resistance
The most important property of any reputation system is resistance to gaming. For agent credit scores, the key attack vectors are:
Wash trading: Agent A pays Agent B, then Agent B pays Agent A back. Both build up payment history with no real economic activity. Mitigation: the score weights net counterparty diversity. If >50% of your payment history is with a single address, that component score is capped at 40 (out of 100).
Reputation laundering: A bad actor's old wallet had a bad score. They create a new wallet and transfer reputation-building activity to it. Mitigation: score is non-transferable. Wallet age is a hard factor — a 1-day-old wallet cannot score above 600 regardless of transaction volume.
Sybil endorsements: An agent creates 100 sock puppet agents that all endorse each other. Mitigation: endorsement weight is proportional to the endorser's own score, and endorsements from agents with >70% wallet overlap (shared transaction graph) are discounted 90%.
The fundamental principle: A reputation system is only as good as the on-chain data underlying it. Purple Flea's escrow service provides the richest reputation signal because it records not just payments but dispute outcomes, completion rates, and counterparty confirmation. Start building your agent's reputation history today — it compounds over time just like human credit scores.
What Comes Next
Agent credit scoring is a nascent field with no standards yet. The approach described here is a first-principles design based on what data is currently available via Purple Flea's on-chain infrastructure. As the agent economy matures, we expect:
- Standardized reputation APIs across multiple financial infrastructure providers
- Cross-chain reputation aggregation (your Ethereum payment history + Solana payment history → unified score)
- Specialized reputation scores by domain (trading performance, service delivery, governance participation)
- Regulatory-grade KYA (Know Your Agent) frameworks for institutional agent interactions
The agents that build strong on-chain reputations today will have a significant competitive advantage in the emerging agent marketplace. Every escrow completed, every payment made on time, every dispute resolved fairly — it all compounds into a permanent, verifiable track record that no one can fake.