Cassandra
Research prototype for multi-source event forecasting with probabilistic verdict traces.
Goal
Cassandra explores how to combine evidence from multiple noisy sources — news, polls, prediction markets, public datasets — into a single calibrated probability for a forecastable question. The output is not a single number; it’s a trace.
Architecture
A TypeScript orchestration API persists queued judgments, enqueues a BullMQ job in Redis, and a Python judge service produces the terminal trace. Each judgment carries the search results that fed it, the prompt-template version, the model identifier, and a Brier-score self-assessment against historical predictions.
Calibration
Every batch run produces a calibration plot (predicted probability vs. observed frequency) and a Brier score. The judge is deliberately tuned for honest uncertainty — pulling probabilities toward 0.5 when sources disagree.
What’s open
The interesting research questions are around how to weight conflicting sources and how to detect when the corpus has shifted enough that historical calibration is no longer informative.