PRD: SKU-Level Billing¶
Owner: Brandon Shin
Linear project: Stripe Billing Migration — SKU Specific Meters
Companions: migration-rules.md, provisioning-behavior.md.
This PRD describes the runtime billing contract as it stands post-ENG-2914: mode-dispatched preflight, with OrganizationConfiguration.billing_mode as the routing key. The previous PRD revisions (atomic-flip cutover → live waterfall → mode-dispatched) are git history; this file is the single current source.
1. Problem (one paragraph)¶
PaperRun's flat sent_mailer meter loses revenue silently (Stripe accepts events without an attached Price), can't be reconciled on the customer's invoice (one line for every format), and has three disagreeing sources of truth for unit price (Stripe.Price.unit_amount, OrganizationConfiguration.mailer_price, and a hardcoded 0.65 in /v1/billing/<org_id>/costs). The fix is per-SKU metering with a single DB-resident source of truth, gated by a preflight check that fails closed.
2. Requirements¶
| ID | Requirement |
|---|---|
| R1 | Every CampaignTemplateSize resolves to its own price (1:1 with billing key). Unknown sizes fail closed. |
| R2 | A customer running N formats sees N invoice line items. |
| R3 | One authoritative price per (org, billing_key), stored in billing_rate_card_entries. Stripe is reconciled to match — never the reverse. |
| R4 | No send is dispatched unless preflight_campaign_billing(campaign) returns passed=True. |
| R5 | Migration day preserves every org's effective rate (default $0.65, A6_NL $0.80, negotiated rates retained). |
| R6 | Opt-in is per (org, billing_key) — one rate card row going pending → active. |
| R7 | Every billed send persists the metadata needed to reconstruct its price (rate card version, meter event name, resolved unit amount, currency). |
3. The Model — Mode-Dispatched Preflight¶
OrganizationConfiguration.billing_mode is the routing key. It is authoritative and read on every send.
BillingMode
├── OrgFlatMeter → evaluate against org's flat sent_mailer (or bfcm_send for BFCM)
└── SkuSpecificMeter → evaluate against billing_rate_card_entries row for (org, billing_key)
There is no waterfall and no silent fall-through between modes. A SkuSpecificMeter org with a missing rate-card row is a hard block, not a fallback to legacy — that is the revenue-leakage scenario the migration exists to close.
3.1 Single call site¶
One mode-dispatched evaluator, used by production callers and operator scripts alike:
# app/methods/billing_preflight.py
def preflight_campaign_billing(campaign: Campaign) -> PreflightOutcome:
"""Dispatch on organization.configuration.billing_mode; mode-specific evaluator: org-flat or sku-specific."""
Both production call sites and operator triage scripts invoke preflight_campaign_billing directly. An earlier draft of this PRD proposed splitting the production seam into a separate billing_gateway.check_can_bill wrapper for a production-vs-operator breadcrumb tag; that extraction was reverted as over-engineered for three call sites — see §7 G1.
Production call sites:
app/routes/campaign_routes.py— activation gateapp/celery/proofer.py— proofingapp/celery/mailer.py— pre-send check + per-recipient send
PreflightOutcome carries passed, route (org_flat_meter | sku_specific_meter | none), the rate-card entry id when applicable, hard failures, soft warnings, and inspector-only diagnostics.
3.2 Inputs & caches¶
Every preflight call reads two things:
- The
Organizationand itsOrganizationConfiguration(Postgres, cached per-request). - A
SubscriptionSnapshot— a Pydantic slice of the org's live Stripe subscription items, cached in Redis atbilling:preflight:sub:{organization_id}with a 30-minute TTL (_SNAPSHOT_TTL_SECONDS = 1800inapp/methods/billing_preflight.py). The provisioner busts the key after attaching a new item; operator scripts that mutate Stripe out-of-band must invokebust_subscription_snapshot(or pass--bust-cachewhere supported) for the change to be visible to preflight inside the TTL window.
Net cost: one Stripe Subscription.retrieve round-trip per org per ~30 min, regardless of send volume.
3.3 Reason codes (canonical)¶
The fixed set the preflight emits today (PreflightFailureCode, PreflightWarningCode, PreflightDiagnosticCode):
| Code | Class | Mode | Meaning |
|---|---|---|---|
NO_STRIPE_CUSTOMER |
failure | both | Org has no stripe_customer_id. |
NO_ACTIVE_SUBSCRIPTION |
failure | both | No active subscription with billable items. |
NO_FLAT_METER_ITEM_ATTACHED |
failure | OrgFlat | Required flat item (sent_mailer, or bfcm_send for BFCM) not on subscription. |
FLAT_METER_ITEM_MISSING_UNIT_AMOUNT |
failure | OrgFlat | Flat item's Price has no unit_amount (e.g., tier-priced). |
FLAT_METER_ITEM_MISSING_CURRENCY |
failure | OrgFlat | Flat item's Price has no currency. |
FLAT_METER_PRICE_DRIFT |
failure | OrgFlat | Stripe unit_amount ≠ mailer_price * 100 (non-BFCM only). |
NO_RATE_CARD_ENTRY |
failure | SkuSpecific | No active rate card row for (org, billing_key). |
RATE_CARD_STRIPE_DRIFT |
failure | SkuSpecific | Rate card row's stripe_subscription_item_id not present in live snapshot, or mismatched price/meter. |
PER_SKU_PRICE_DRIFT |
warning | SkuSpecific | Live unit_amount ≠ rate card unit_amount_cents. Bills at DB value (D10) and emits warning. |
FLAT_METER_CANONICAL_DRIFT |
diagnostic | OrgFlat | Inspector-only: mailer_price below migration-rules.md canonical default. |
FLAT_METER_CANONICAL_DRIFT_PINNED |
diagnostic | OrgFlat | Inspector-only: pinned key (A6_NL) diverges from canonical. |
The set is open for extension; it never shrinks.
3.4 Invariants¶
- A campaign references a billing key; it does not own pricing state.
- The DB row is canonical. Stripe is provisioned from it (D10). Drift is fixed by re-running the provisioner, not by patching the DB.
- Rate card entries are append-only. Substantive fields immutable; lifespan
[active_at, inactive_at); partial unique index pins one current row per(org, billing_key). - Exactly one meter event per send. Either the per-SKU meter (SkuSpecific mode + active row) or the flat meter (OrgFlat mode) — never both, never neither.
- The legacy
sent_mailerSubscriptionItem stays attached on every org throughout the cohort phase. M5 removes it; cohort rollout does not. campaign_recipient_idis the billing idempotency key (Stripe identifier prefixmailer-{recipient_id}preserved for continuity).
4. Runtime diagram¶
flowchart TD
A[Campaign activation, proofer, or mailer send] --> P["preflight_campaign_billing(campaign)"]
P --> G1{stripe_customer_id<br/>+ active subscription?}
G1 -- no --> BLOCK[passed=False<br/>route=none<br/>BLOCK]
G1 -- yes --> SNAP[Load SubscriptionSnapshot<br/>Redis 30-min TTL<br/>else Stripe API]
SNAP --> MODE{billing_mode}
MODE -- OrgFlatMeter --> F1{flat item attached?<br/>sent_mailer or bfcm_send}
F1 -- no --> BLOCK
F1 -- yes --> F2{unit_amount + currency present?}
F2 -- no --> BLOCK
F2 -- yes --> F3{non-BFCM:<br/>unit_amount == mailer_price*100?}
F3 -- no --> BLOCK
F3 -- yes --> FOK[route=org_flat<br/>passed=True<br/>emit at Stripe unit_amount]
MODE -- SkuSpecificMeter --> S1{active rate card row<br/>for org, billing_key?}
S1 -- no --> BLOCK
S1 -- yes --> S2{row's subscription_item_id<br/>in snapshot, matches price + meter?}
S2 -- no --> BLOCK
S2 -- yes --> S3{live unit_amount ==<br/>row.unit_amount_cents?}
S3 -- yes --> SOK[route=sku_specific<br/>passed=True<br/>emit at row.unit_amount_cents]
S3 -- no --> WARN[+ warning:<br/>PER_SKU_PRICE_DRIFT]
WARN --> SOK
FOK --> SEND[Mailpiece ships<br/>app/celery/mailer.py]
SOK --> SEND
SEND --> EMIT[update_billing_meter<br/>idempotency=campaign_recipient_id]
EMIT --> STRIPE[(Stripe MeterEvent.create)]
STRIPE --> AUDIT[CustomerMeterBilled log line<br/>+ route + warnings]
classDef block fill:#fdd,stroke:#c00,color:#600;
classDef ok fill:#dff5e1,stroke:#2f9e44,color:#1b4d2b;
classDef warn fill:#fff4d6,stroke:#d99a00,color:#5c3d00;
class BLOCK block
class FOK,SOK,AUDIT ok
class WARN warn
5. Usage patterns¶
5.1 Call the preflight¶
from app.methods.billing_preflight import preflight_campaign_billing
outcome = preflight_campaign_billing(campaign)
if not outcome.passed:
# Activation path: bare reject with 422. The campaign stays in its prior
# state (Prospect/Draft/Paused) and the operator sees the failure inline.
# No auto-pause and no CAMPAIGN_AUTO_PAUSED emit — there is no Active
# campaign to pause yet, and a synchronous pause + CSM ticket on a failed
# click is the wrong place to put that signal. The 422 payload is
# `{"error": "billing_not_ready", "failures": [...], "route": ...}` —
# the dashboard renders the failure codes inline.
return reject(outcome.failures)
# Send path (proofer): transition recipient to error state with the first
# failure code AND auto-pause the campaign so the block surfaces in the
# campaign list and CAMPAIGN_AUTO_PAUSED reaches CSM. The proofer is the
# right home for the pause — by the time billing breaks mid-flight, the
# campaign is Active, so flipping it to Paused is a meaningful state change
# and the CSM event reflects a real degradation.
# outcome.route is "org_flat_meter" or "sku_specific_meter"; outcome carries emit identifiers.
Where auto-pause lives: on the proofer side (app/celery/proofer.py), not the activation route. Activation just blocks the Prospect/Draft → Active transition and returns 422; the dashboard surfaces the failure codes inline. Once a campaign is Active and billing drifts, the proofer is the first code path to notice — that's where pausing + emitting CAMPAIGN_AUTO_PAUSED (the same lever cohort ops already use for per-campaign mismatches) actually buys CSM something to act on.
Billing pauses use a 7-day resume window (app/celery/proofer.py). A standard auto-pause schedules resume_date = now + 48h, but the 48h cadence oscillates pause→auto-resume→re-pause when Stripe state is still broken at the timer's expiry. Billing-not-ready pauses raise the window to 7 days so the Linear ticket emitted by handle_campaign_paused (app/core/integrations/inngest/functions/campaigns.py) is the operative recovery signal rather than the timer. pause_reason lives only on the CAMPAIGN_AUTO_PAUSED event payload — it is not persisted on campaign.settings, since the ticket itself is the system of record for the pause cause. The Inngest handler's action_key = f"campaign_paused:campaign={campaign_id}" dedupes follow-on re-pauses inside the window to one Linear ticket per campaign. Implementation-plan M3.4 records this as the chosen approach and notes the cleanup follow-up.
5.2 Promote an org to SkuSpecificMeter¶
Note: the billing_migration helper CLIs below were carved off to project/brandon-billing-migration-scripts (ENG-3035) after the migration; the commands are retained here as a record of the M2.x promotion flow. The equivalent live logic now lives in app/methods/billing_rate_card.py.
# Already on the org? read-only triage first.
just script billing_migration inspect_billing_mode --org-id <id>
# Seed pending rate card rows for the org's active billing keys.
just script billing_migration seed_rate_card_entries --org-id <id>
# Provision Stripe + attach as new SubscriptionItem alongside the legacy item.
just script billing_migration provision_meter --org-id <id> --billing-key <key>
# → stamps stripe_subscription_item_id + active_at on the row; busts Redis snapshot.
# Flip OrganizationConfiguration.billing_mode → "sku_specific_meter" only
# once every active billing_key for the org has an active rate card row.
5.3 Kill switch (cohort phase)¶
Tombstone a single rate card row:
If the org remains SkuSpecificMeter, the next send for that key hard-blocks with NO_RATE_CARD_ENTRY. To revert to flat billing, also flip billing_mode back to OrgFlatMeter — the legacy sent_mailer item is still attached per invariant 5.
5.4 Reconcile a drifted org¶
PER_SKU_PRICE_DRIFT warnings or RATE_CARD_STRIPE_DRIFT failures: re-run provision_meter against the DB row. Never patch the DB to mirror Stripe.
6. Concepts to be aware of¶
| Concept | What it is | Where it lives |
|---|---|---|
| Billing key | CampaignTemplateSize value, plus bfcm_send for BFCM campaigns. The single pricing axis (D1, D5). |
app/core/integrations/stripe/meters.py |
| Billing mode | Per-org routing key on OrganizationConfiguration. org_flat_meter (default) or sku_specific_meter. Authoritative — there is no implicit promotion. |
app/core/models/organizations.py |
| Rate card entry | One row per (org, billing_key) version. Append-only; substantive fields immutable; lifespan [active_at, inactive_at). |
app/core/models/billing_rate_card.py |
| SubscriptionSnapshot | Pydantic slice of live Stripe state, cached in Redis 30-min per org. | app/methods/billing_preflight.py |
| PreflightOutcome | Dataclass: passed, route, rate_card_entry_id, failures, warnings, diagnostics. |
app/methods/billing_preflight.py |
| Canonical drift | Comparison of mailer_price against migration-rules.md R1 defaults. Diagnostic-only (inspector); never blocks sends. |
migration-rules.md, _org_flat_canonical_drift_diagnostics |
| A6_NL pin | Structural: unit_amount_cents must equal the canonical NL A6 default at the rate-card row. Enforced at provisioning time. |
migration-rules.md R2 |
| Universal-backup invariant | The flat sent_mailer item stays attached on every org through cohort rollout. M5 removes it. |
invariant 5 |
| DB → Stripe direction (D10) | All writes flow DB → Stripe via the provisioner. Stripe is read only at preflight and daily reconciliation. | app/methods/billing_rate_card.py |
Production call sites¶
app/routes/campaign_routes.py:413— activationapp/celery/proofer.py:50— proofingapp/celery/mailer.py:89— pre-send batch gateapp/celery/mailer.py:476— per-recipient send
Inspector / provisioner scripts deliberately bypass the gateway-style dispatch where one exists and read raw waterfall results for operator triage.
7. Gaps — decided¶
Resolutions below are decisions, not options. Branch code or docs follow.
| # | Gap | Decision | Owner |
|---|---|---|---|
| G1 | Earlier PRD draft prescribed a separate billing_gateway.py seam wrapping preflight_campaign_billing so production send-path preflights could be tagged with their own breadcrumb. |
Won't do — keep inlined. Extraction was attempted (M3.1) and reverted: three production call sites wrapping one passthrough function plus a breadcrumb wasn't earning the indirection. Production and operator scripts both call preflight_campaign_billing directly; the existing billing.preflight breadcrumb is sufficient observability. Revisit only if a future need (rollout gates, kill switches) actually motivates a chokepoint. |
Closed. |
| G2 | SKU_MODE_REQUIRES_RATE_CARD documented, not in enum. |
Drop from docs. NO_RATE_CARD_ENTRY + RATE_CARD_STRIPE_DRIFT already cover every fall-through case the backstop would have caught. |
Done in this proposal. |
| G3 | Canonical drift documented as runtime warning + A6_NL failure; code emits inspector-only diagnostics. | Diagnostic-only stands. Runtime path stays quiet; A6_NL is structurally pinned at provisioning time (migration-rules.md R2), which is the real guard rail. Update prd.md §D3.1 to drop the runtime-warning framing. |
Done in this proposal §3.3. |
| G4 | Route literal mismatch (org_flat vs org_flat_meter). |
Use the enum-derived literals everywhere: org_flat_meter, sku_specific_meter, none. Single source = BillingMode.value. Doc strings in PreflightOutcome updated to match. |
Doc fix + verify _outcome_blocked returns the suffixed strings. |
| G5 | Stale waterfall diagram in prd.md §13.1. |
Replace with §4 diagram of this proposal. | Done. |
| G6 | D12 self-contradiction. | Cut the "Live waterfall" subsection and "billing_mode is descoped" paragraph. |
Lands when prd.md is retired in favor of this file. |
| G8 | LEGACY_PRICE_DRIFT → FLAT_METER_PRICE_DRIFT rename incomplete in prd.md. |
Find-and-replace in any remaining prd.md prose. |
Lands with G6 retirement. |
| G9 | M3.2 reconciliation worker documented, not built. | Keep in scope. Cohort rollout depends on a backstop. Mark as not-yet-shipped in §3, schedule after cohort onboarding stabilizes. Behaviour stays as-spec'd (auto-tombstone at D7 thresholds, Sev-2 page). Until shipped, kill switch is manual set_billing_rate_card_inactive. |
New Linear ticket; sequence after current cohort. |
| G10 | /v1/billing/<org>/costs hardcodes $0.65. |
Shipped as M3.5. app/methods/billing.py::get_billing_response now branches on BillingMode: OrgFlatMeter returns mailer_price; SkuSpecificMeter resolves list_billing_rate_card_entries(org_id) and exposes a per-billing_key rate_card breakdown alongside the legacy mailer_unit_cost scalar. |
Closed. |
8. What this proposal removes from the current PRD¶
- The historical D-list framing (D1–D12). The substantive decisions are folded into §2 (Requirements) and §3 (Model).
migration-rules.mdremains the canonical home for the price table andA6_NLinvariant. - The atomic-flip / cutover-boundary discussion (old D6, the M5 ratchet path). The runtime PRD doesn't need to carry future cleanup options.
- The live-waterfall section of D12. Superseded by the mode-dispatched model. The §13.1 diagram is replaced by §4 above.
- Three-layers / two-drift-checks (D3.1). The runtime model only needs to know that canonical comparison is a diagnostic, not a gate. The full rule belongs in
migration-rules.md. - The "personas & impact" + "non-goals" + "customer-facing changes" sections. These are stable and can live in a separate
docs/billing/product-context.mdif we want to keep them; they don't change with each implementation decision.
9. Suggested doc layout going forward¶
docs/billing/
├── prd.md # this file — runtime contract + invariants + reason codes
├── product-context.md # personas, non-goals, invoice format, change log (NEW, extracted)
├── migration-rules.md # canonical price table, A6_NL pin, bucket rules (UNCHANGED)
├── provisioning-behavior.md # provisioner reuse/swap behavior + dry-run plan (NEW)
└── self-managed-billing-investigation.md # one-off research (UNCHANGED)
cohort-simplification-design.md has been retired — its load-bearing content is folded into §3 + §5.2 here.