Live calibration

How well does the engine predict?

When the engine says P=0.40, do those leads actually approve 40%of the time? This page shows the reliability diagrams, Expected Calibration Error, and Brier scores from every outcome that's landed against this engine version.

No observations yet for engine 0.1.0.

Calibration metrics appear here once leads scored by this engine version produce observed outcomes — either via the matter-outcome cron or the operator outcome upload at /outcomes.

How to read these charts

Reliability diagram

Each dot is a probability bin. The x-axis is the engine's predicted probability; the y-axis is the observed approval rate in that bin. A perfectly calibrated model lies on the diagonal. Dots above the diagonal = engine underpredicted; below = engine overpredicted.

ECE (Expected Calibration Error)

The sample-size-weighted average of |observed − predicted| across bins. Lower is better. ECE = 0.05 means predictions are off by about 5 percentage points on average.

Brier score

Mean squared error between predicted probability and 0/1 outcome. Lower is better. A model that always says P=base_rate scores ~base_rate × (1 − base_rate); a perfect model scores 0.