An AI trading bot for Kalshi ingests news, social, and official feeds, generates a probability estimate for the YES side of a Kalshi market, and either alerts you or fires the trade - depending on your trust threshold. Technically: LLM-as-classifier pipeline, structured output with reasoning traces, every decision logged. We use Anthropic Claude or your model of choice; we don't lock you in.
Audit-able LLM execution requires structured outputs and stored reasoning traces. We refuse to ship without them.
News feeds (AP, Reuters, FT), official sources (Fed releases, BLS, EIA), social filtered for credible accounts. Per market, per category.
LLM produces a structured output: probability estimate, top 3 cited sources, confidence level, dissenting evidence. Stored verbatim per decision.
If confidence + edge clears your threshold → fire. Below threshold → alert and skip. New market categories → human-confirm first 10 trades to calibrate.
Structured output schema is mandatory. We don't ship pipelines where the LLM's output is a free-form chat.
# Kalshi AI bot - execution config strategy: "llm-driven" model: "claude-sonnet-4.5" # or your endpoint market_filter: series: ["FED", "CPI", "BTCD"] min_hours_to_resolution: 6 signal_pipeline: sources: - "reuters-api" - "fed-releases" - "x-curated-list-id-42" output_schema: "structured-prob-trace-v2" retain_traces: true # audit requirement decision: min_edge_cents: 4 min_confidence: 0.7 # LLM self-rated hitl_first_n_trades: 10 # human-confirm initial execution: size_usdc: 600 slippage_bps: 75
Structured outputs with cited evidence. Disagreement detection across multiple model calls. Confidence thresholds. Human review on new categories. These aren't 'nice to have' - they're load-bearing.
If your edge is being first to a public news item, LLMs are too slow. The AI bot's edge is consistency on noisy signal - reading 200 sources and synthesizing them when a human can't keep up.
Audit requirements mean storing prompts + outputs + reasoning indefinitely. We budget S3 + Postgres in the infra cost line. Don't ship without it.
If you'd refuse to trade a market manually, your LLM doesn't suddenly have an edge. AI amplifies your thesis - it doesn't generate one.
Domains where LLMs read more than humans can.
Fed releases are 5-15 pages each. LLM extracts the key signal in seconds - most consistent edge.
BLS + Reuters + sell-side commentary. LLM aggregates all of it before a human could read one source.
SEC/CFTC filings, congressional hearings. LLM detects shifts in language faster than humans.
LLM ingests sources, posts signals to your Telegram. You execute manually.
Adds HITL → autonomous execution pipeline with audit trail.
Tell us the Kalshi categories where you think LLMs would read better than you. We'll start with a small pipeline and benchmark.