Skip to content

Computed Metrics Grading

The single source of truth for "is this campaign good, bad, or neither?" across MCP, reporting, insights, and recommendations. There is no parallel grading path — every consumer bottoms out here.

MCP feed. The MCP serves ComputedMetrics.to_dict() as the canonical campaign payload (get_full_campaign_stats, get_org_campaign_metrics, and per-campaign on get_full_org_stats) — not the raw cached StatsResponse. There is no separate "MCP ComputedMetrics"; the MCP is one consumer of this model. See app/mcp/schema_docs.py for the agent-facing contract and reporting.md for the cache it reads.

Reading order: skim TL;DR → scan the section you need → jump to the Appendix for the gnarly bits (windowing, fallback semantics, promoted volume/rate). Each gotcha is anchored so other sections can link straight to it.

TL;DR

  1. Derive a ComputedMetrics from a StatsResponse, per-classifier (RAW / EXPERIMENT / PROMOTED).
  2. Grade each of its two splits (first_purchase, lifetime) with grade_metricGOOD | NEUTRAL | BAD | UNKNOWN.
  3. Decide with decide_campaign(metrics)PAUSE | SCALE | MONITOR | NO_ACTION. Pure function of grade × classifier.
  4. Route the decision per-consumer using maturity (JustLaunched / Early / Mature). Maturity does NOT change the decision — only who sees it. (why)

Anchor files

Concept File
ComputedMetrics, MetricGrade, CampaignClassifier, CampaignMaturity, thresholds, grade_metric app/core/models/insights/computed_metrics.py
decide_campaign, per-grader should_notify, headline/criteria/priority strings app/core/models/insights/grader.py
Stats → ComputedMetrics derivation app/methods/insights/metric_derivation/
Promoted-parent wiring on top of derivation app/methods/computed_metrics.py

The shape

ComputedMetrics
├── classifier:        RAW | EXPERIMENT | PROMOTED
├── spend, weeks_live, mailers_sent
├── first_purchase:    MetricSplit + first_purchase_grade: MetricGrade
├── lifetime:          MetricSplit + lifetime_grade: MetricGrade
├── maturity:          (derived) JustLaunched | Early | Mature   ← routing metadata, not a grade input
└── basis:             provenance string for MCP/reporting consumers

Each MetricSplit holds revenue, roas, cac, orders, optional uplift, optional win_probability. The two splits are graded independently. basis is provenance only — see Appendix → basis is not a grading input.

Classifier

Three classes, controlling which derivation runs and which decisions are eligible:

Classifier Trigger Headline source
RAW No holdout, no prior experiment observed stats
EXPERIMENT campaign.settings.holdout.enabled current holdout's experiment_results
PROMOTED A HoldoutToFull association exists in campaign_associations parent's lifetime experiment_results

Promoted classification requires a campaign_associations lookup — call through app/methods/computed_metrics.py, not compute_metrics directly. (why)

Derivation

ComputedMetrics is derived on every read — not stored. Same StatsResponse, different fields per classifier.

graph LR
  AGG[(campaign_order_aggregates_hourly)] -->|fetch_hourly_aggregates<br/>± date window| HA[hourly rows]
  RCP[(campaign_recipients)] -->|fetch_recipient_totals_by_campaign<br/>± date window| RT[recipient totals]
  HA --> SUM[sum_aggregates_to_stats] --> SDICT[stats]
  RT -->|mailers_sent, total_cost| DERIV[_compute_derived_metrics]
  SDICT --> DERIV --> SDICT2[stats + roas/cpa]
  HA -->|holdout rows| EXPR[_compute_experiment_results]
  RT -->|Holdout count| EXPR --> ER[experiment_results]
  SDICT2 --> CLASSIFY[classify]
  ER --> CLASSIFY
  CLASSIFY -->|RAW| DR[derive_raw]
  CLASSIFY -->|EXPERIMENT| DE[derive_experiment]
  CLASSIFY -->|PROMOTED| DP[derive_promoted]
  DR & DE & DP --> CM[ComputedMetrics]

Field → source mapping

For PROMOTED, fields split into volume (from the child's stats) and rate (from the parent's lifetime experiment snapshot). This split is load-bearing — see Appendix → Promoted: volume vs rate.

Field RAW EXPERIMENT PROMOTED
spend, weeks_live, mailers_sent stats.* stats.* child stats.*
first_purchase.revenue stats.first_purchase_revenue experiment_results[first_order].incremental_revenue (fallback: stats.first_purchase_revenue) child_spend × parent_first_purchase_roas
first_purchase.orders stats.first_purchase_orders experiment_results[first_order].orders (fallback: stats.first_purchase_orders) child stats.first_purchase_orders
first_purchase.roas stats.first_purchase_roas experiment_results[first_order].incremental_roas (fallback: 0.0) parent experiment_results[first_order].incremental_roas
first_purchase.cac spend / orders (or None) experiment_results[first_order].incremental_customer_acquisition_cost (fallback: 0.0) parent value, gated on parent's incremental orders
lifetime.revenue stats.revenue ‖ stats.campaign_revenue experiment_results[all_orders].incremental_revenue (fallback: same as RAW) child_spend × parent_lifetime_roas
lifetime.orders stats.campaign_orders ‖ stats.orders experiment_results[all_orders].orders (fallback: same as RAW) child stats.campaign_orders
lifetime.roas stats.all_time_roas ‖ stats.roas experiment_results[all_orders].incremental_roas (fallback: 0.0) parent experiment_results[all_orders].incremental_roas
lifetime.cac None None None
uplift, win_probability unused experiment_results[*].metric_uplift / .win_probability (no fallback) parent values
basis "raw" "incremental(holdout)" "modeled(prior_experiment)"

The fallback rows for EXPERIMENT and the windowing-induced zeros have surprising downstream effects — see Appendix → Experiment fallback and Appendix → Date windowing.

Maturity (routing, not grading)

JustLaunched : weeks_live <= 4  OR  mailers_sent < 500
Early        : weeks_live <= 10 OR  mailers_sent < 5000
Mature       : weeks_live > 10  AND mailers_sent >= 5000

A long-running campaign with few mailers is not mature — volume matters. Tunable in CampaignMaturity.derive.

Consumer Maturity gate
decide_campaign none (grade × classifier only)
CampaignGrader.should_notify (Slack, CSM insights) Early+
_evaluate_campaign (writes Recommendation row) Mature only
CampaignGrader.priority MatureHIGH; Early + PAUSEMEDIUM; else LOW

Maturity is not a decision input — see Appendix → Maturity routes, it doesn't decide.

Grading a split

grade_metric(roas, cac, weeks_live, has_orders) → MetricGrade is the only grading function. Runs once per split.

UNKNOWN   weeks_live < 1  OR  not has_orders  OR  roas is None
GOOD      roas >= 2.0    AND  (cac is None or cac <= 40)
BAD       roas < 1.0     OR   cac > 180
NEUTRAL   everything else
Threshold Value
GOOD_ROAS_THRESHOLD 2.0
GOOD_CAC_THRESHOLD 40.0
BAD_ROAS_THRESHOLD 1.0
BAD_CAC_THRESHOLD 180.0

All four live in computed_metrics.py. Do not redefine elsewhere.

Decision

decide_campaign(metrics) → Decision is pure grade × classifier:

flowchart TD
  start([ComputedMetrics]) --> grade{lifetime_grade}
  grade -->|BAD| pause[PAUSE]
  grade -->|GOOD| cls{classifier}
  grade -->|NEUTRAL / UNKNOWN| noop[NO_ACTION]
  cls -->|EXPERIMENT| scale[SCALE]
  cls -->|RAW or PROMOTED| monitor[MONITOR]

Three things that catch people:

  • PAUSE fires the moment lifetime_grade == BAD, regardless of maturity. Maturity only gates surfacing.
  • SCALE is reserved for live experiments graded GOOD. A doing-well promoted campaign is MONITOR — the scale decision already happened. (why)
  • NEUTRAL (ROAS in [1.0, 2.0)) is intentionally NO_ACTION. (why)

Where the decision is used

Caller Behavior
campaign_grader.py Wraps in InsightData; should_notify requires Early+ AND non-NO_ACTION.
recommendations.py::_evaluate_campaign SCALEScaleExperiment rec, PAUSEPauseCampaign rec. Requires Mature.
CampaignGrader (in grader.py) Headline/criteria/priority strings (CSM-facing).

Worked examples

Scenario weeks_live mailers_sent classifier lifetime ROAS decide should_notify Rec written?
Brand-new send, no orders 2 800 RAW n/a (UNKNOWN) NO_ACTION No No
JustLaunched, BAD 2 800 RAW 0.4 PAUSE No (maturity) No
Live experiment crushing it (Early) 6 8000 EXPERIMENT 3.1 SCALE Yes No (needs Mature)
Same experiment, mature 12 20000 EXPERIMENT 3.1 SCALE Yes Yes — ScaleExperiment
Mature automation, mid-band 14 18000 RAW 1.4 NO_ACTION No No
Mature automation, underwater 14 18000 RAW 0.7 PAUSE Yes Yes — PauseCampaign
Promoted send, doing well 12 20000 PROMOTED 2.5 MONITOR Yes No

Tuning

Knobs: - Four numeric thresholds, the maturity rule, the decision tree — computed_metrics.py / grader.py. - Per-consumer routing gates — next to each consumer (should_notify, _evaluate_campaign's Mature check).

Tune in place. Do not introduce a parallel grading path. If per-org tuning becomes necessary, extend grade_metric / decide_campaign to take an OrgConfig-like argument rather than branching at call sites.


Appendix: Gotchas & things to know

Stuff that looks right but isn't, plus the design choices that aren't obvious from the field tables. Each entry has a stable anchor so the sections above can link straight in.

A promoted campaign carries forward the rate (ROAS, CAC, uplift, win_probability) that its parent's holdout experiment measured, and multiplies it against the volume (spend, mailers) the child has actually shipped.

modeled_revenue = child_windowed_spend × parent_lifetime_roas

Two roles, two sources:

  • Volume → child stats. Windowable. Narrows with start_date / end_date.
  • Rate → parent's lifetime experiment_results. Never windowed.

Mixing these is a silent data-quality bug — a date-windowed rate projected onto live volume is meaningless. The orchestration layer enforces this at runtime by fetching parent stats through fetch_parent_stats_for_promoted, and derive_promoted only accepts a PriorExperimentSnapshot — a windowed StatsResponse won't type-check as a rate source. Persistence of the snapshot at promotion time is still open; see Known limitations & open work.

Stationarity assumption. Using a lifetime parent ROAS assumes that rate is a roughly stable property of the audience × creative. If a parent experiment is old and the audience has shifted, the modeled revenue will drift from reality. There's no age-out today; consider it if this becomes a complaint.

Example. Child has shipped $10k this month; parent's lifetime incremental ROAS is 2.4. Modeled revenue = $24k — even though the child has no holdout of its own.

Date windowing has cascading effects

start_date / end_date apply to two underlying queries:

Source Date filter? Affects
fetch_hourly_aggregates yes, on hour_bucket order-derived columns (campaign_revenue, first_purchase_revenue, …)
fetch_recipient_totals_by_campaign yes, on campaign_recipients.created_at mailers_sent, total_cost, holdout count

Campaigns send in bursts; orders trickle in over months. A typical "last N days" window excludes the recipient rows (dated at send time) while still capturing orders. Knock-on effects:

  • mailers_sent == 0 and total_cost == 0roas / first_purchase_roas / cpa coerced to 0.0 (divide-by-zero guard). Note: not None — coerced zero, which grades as BAD if has_orders is true.
  • Holdout recipient_count == 0experiment_results returns [], which triggers the experiment fallback.

Per-classifier effect under a date window:

Classifier Behavior
RAW Partial degradation: revenue / orders retain real values; roas lands at 0.0.
EXPERIMENT Falls back to observed stats when incremental can't compute.
PROMOTED Volume narrows; rate stays lifetime; modeled_revenue narrows linearly with spend.

Example. Campaign sent 50,000 pieces in March; orders are still rolling in. Querying "last 30 days" in May returns revenue from late-arriving orders, but mailers_sent = 0, so ROAS reports 0.0.

Experiment fallback returns roas = 0.0, not None

When experiment_results can't compute incremental (e.g. zero holdout in window), derive_experiment falls back to observed values from stats:

incremental_revenue is None AND experiment_orders is None
  → revenue ← stats.first_purchase_revenue (or stats.revenue ‖ campaign_revenue for lifetime)
  → orders  ← stats.first_purchase_orders  (or stats.campaign_orders ‖ orders for lifetime)
  → roas    ← 0.0   (stable-shape signal: "incremental not computable here")
  → first_purchase.cac ← 0.0   (lifetime.cac stays None as always)
  → uplift / win_probability stay None
  → has_orders OR's in has_positive_count(fallback orders) so grading sees them

When incremental is computable, nothing changes.

Trap: the roas = 0.0 is a sentinel meaning "we have no holdout signal," not "the campaign earned zero." Consumers that need to distinguish the two should check mailers_sent == 0 or look for populated uplift / win_probability. A naive "is ROAS bad?" check will mis-grade these as BAD.

is_empty() / empty_dict() are temporary

ComputedMetrics.is_empty() returns True when both splits have revenue, orders, roas, cac all None. The bulk route uses this to emit empty_dict() (same-shape zeros) so the web-vs-API divergence check can compare stable shapes.

Both methods are temporary — revert to None once divergence work is done. With the experiment fallback in place, is_empty() now only fires when there is genuinely no data.

Maturity routes, it doesn't decide

A JustLaunched BAD campaign still produces PAUSE. Maturity only gates who sees the decision (Slack/CSM vs. recommendation row vs. nothing). Don't add maturity branches inside decide_campaign — the routing belongs at the consumer.

Why: decisions are facts about the metric; whether to act on a noisy fact is a separate concern that depends on the consumer's tolerance for false positives.

SCALE is for experiments only; promoted GOOD is MONITOR

SCALE means "promote this experiment to full audience." A campaign that's already PROMOTED has, by definition, already been scaled — there's nothing left to scale. So a GOOD PROMOTED maps to MONITOR, not SCALE.

If you ever see a SCALE recommendation against a PROMOTED campaign, something is wrong upstream.

NEUTRAL is intentionally NO_ACTION

ROAS in [1.0, 2.0) is the "fine, but not exciting" band. We deliberately do not surface these — surfacing them would be noise. If product wants to surface a "watch" state, add it as a new Decision value rather than re-mapping NEUTRAL.

basis is provenance, not a grading input

basis ("raw" / "incremental(holdout)" / "modeled(prior_experiment)") tells MCP / reporting where the headline numbers came from. It is purely descriptive. Do not branch grading on it — the classifier already encodes the same information in a way that's safe to switch on.

Use orchestration helpers for promoted campaigns

Promoted classification requires a campaign_associations lookup to find the parent. Calling compute_metrics directly will miss this and silently classify a promoted campaign as RAW or EXPERIMENT.

Always go through app/methods/computed_metrics.py for any code path that might see promoted campaigns (i.e. anything that reads metrics for an arbitrary campaign).

Known limitations & open work

Things a reader needs to know about current code that aren't obvious from the field tables.

Parent snapshot is re-aggregated on every read

For PROMOTED, "parent lifetime experiment_results" is computed on each request rather than snapshotted at promotion time. A late-arriving order attributed to a parent silently changes every active promoted child's modeled revenue and grade. Combined with the bright-line thresholds (roas >= 2.0 → GOOD, < 1.0 → BAD), small drifts can flip grades overnight with no audit trail.

The runtime contract (rate source must be a PriorExperimentSnapshot, not a windowed StatsResponse) is in place; the persistence of that snapshot at promotion time is not. PriorExperimentSnapshot.snapshot_taken_at is reserved for this but currently unused.

CAC sentinel: 0.0 means "denominator too small to trust"

incremental_customer_acquisition_cost = spend / incremental_orders blows up when incremental_orders < MIN_INCREMENTAL_ORDERS_FOR_CAC (currently 1.0). safe_cac (app/methods/insights/metric_derivation/helpers.py) returns 0.0 in that case as a stable-shape sentinel — matching the experiment-fallback pattern — and grade_metric requires cac > 0 to engage the GOOD or BAD branch on CAC.

This is a parseability shim until the web client tolerates None in CAC. Touch points are tagged with TODO comments. Same caveat as experiment fallback: a naive "is CAC zero?" check will mis-read these.

Maturity is time-only; volume gate lives elsewhere

CampaignMaturity.derive is purely a function of weeks_live, even though the table above describes it as (weeks_live, mailers_sent)-aware. A separate MIN_MAILERS_FOR_RECOMMENDATION = 500 lives in app/methods/recommendations.py and gates whether _evaluate_campaign writes a Recommendation row, but is invisible to decide_campaign and to should_notify. If you tune maturity, tune both — or fold the volume gate into CampaignMaturity.derive and delete the orphan.

empty_dict() / is_empty() shim

See Appendix → is_empty. The bulk reporting route emits empty_dict() (same-shape zeros) instead of null so a web-vs-API divergence checker can compare like shapes. Revert to None once that work completes — with the experiment fallback in place, is_empty() now only fires when there is genuinely no data.