Skip to content

Order Aggregates

File: app/core/models/order_aggregates.py

Pre-computed daily order metrics for fast campaign reporting.

Key fields (DailyAggregate)

  • Dimensions: organization_id, campaign_id, day_bucket (DATE), is_holdout
  • Metrics (JSONB): total_orders, revenue, first_purchase_orders/revenue, repeat_orders/revenue, subscription_orders/revenue, discount_matched_orders, unique_customers, new_subscribers, aov

Computation

  • Frequency: Every 15 minutes via Celery Beat
  • Source: Raw attributions table, filtered by passes_rules=true, archived=false, order_value>0
  • Idempotent: ON CONFLICT DO UPDATE for safe re-runs
  • Progress tracking: attribution_aggregation_progress table tracks last_processed_day per org
  • Sliding window: Each tick recomputes buckets touched in the last ORDER_AGGREGATION_SLIDING_WINDOW_DAYS (default 2) days plus any updated since the previous run. Today's in-progress bucket is included — the upsert refreshes it every tick so dashboards show today's running total. (Hourly granularity used to filter out the in-progress hour; carrying that to daily would have stalled dashboards 24h — the per-tick refresh of today's bucket is the conscious trade-off.)

Holdout-Aware Aggregates

Daily aggregates are split by the is_holdout boolean dimension. Experiment analysis reads is_holdout=True rows directly from pre-computed aggregates instead of scanning raw attributions — this powers fast A/B test reporting on the dashboard.

Reporting Features

Bulk Campaign Stats

GET /v1/reporting/summary/<org_id>/campaigns — fetches stats for all campaigns in a single pass via generate_bulk_campaign_stats_from_aggregates(). Partitions daily + recipient totals by campaign and holdout status, computes experiment results per campaign.

Excluded Recipient Compensation

When certain recipients are excluded from stats (e.g., during experiment analysis), compute_excluded_recipient_stats() queries raw attributions for just those recipients and subtracts their totals from the pre-computed aggregates. This ensures excluded recipients match what the raw reporting path would produce.

Rolling Stats from Aggregates

generate_rolling_stats_from_daily_aggregates() takes daily aggregate rows (one per (campaign, day, holdout)) and applies rolling windows over the time series. AOV is computed from daily totals (revenue / orders) rather than averaging per-row AOVs — avoiding mean-of-means errors.

Summary-Only Mode

The exclude_rolling_stats flag on generate_order_stats_from_aggregates() skips the daily breakdown and uses fetch_reporting_recipient_totals() directly. This optimizes summary-only endpoints that don't need the full time series.

Admin rebuild

Operators can rebuild a single org or one campaign's aggregates out-of-band (e.g. after an attribution rule change or to validate a fix) without shelling into a container.

  • Endpoint: POST /api/v1/admin/aggregations/order/rebuild — admin-only, body {organization_id, campaign_id?, full_refresh?}. full_refresh defaults to false (incremental); pass true to additionally sweep stale rows after the rebuild. Returns 202 {message, task_id, ...}. See routes/aggregation_trigger_routes.py.
  • Task: attribution_aggregates.rebuild_order_aggregates_for_organization(organization_id, campaign_id=None, full_refresh=False) on the slow queue, deduped by QueueOnce on (organization_id, campaign_id). Delegates to methods/aggregation_rebuild.backfill_order_aggregates and resets the Redis reporting cache on completion. The same method powers the scripts/backfill_order_aggregates.py CLI.
  • Mark-and-sweep on full_refresh=True: existing rows stay visible the whole run. The orchestrator captures db_now() before the chunk loop, every chunk's UPSERT refreshes last_updated = NOW(), and at the end delete_stale_org_aggregates(org_id, campaign_id, before=run_started_at) removes only rows the run did not touch. Readers never see an empty stats window.

Config-triggered rebuild

Saving a change to organizations.configuration.excluded_attributions from PUT /api/v1/organizations/<id> dispatches recalc_campaign_pipeline(org_id, campaign_id=None, full_refresh=True) (see app/celery/recalc_pipeline.py) rather than the order-only admin task. The pipeline runs order rebuild → recipient rebuild → (skipped at org scope) stats warm; the recipient rebuild is redundant for an attributions-only change but keeps one entry point for all recalc dispatches on the same admin, low-frequency operation. QueueOnce on (organization_id, campaign_id=None) dedupes rapid back-to-back saves on the same org.

Key files

  • Model: core/models/order_aggregates.py
  • Rebuild orchestration: methods/aggregation_rebuild.py
  • Scheduled + admin-trigger task: celery/attribution_aggregates.py
  • Config-triggered recalc pipeline: celery/recalc_pipeline.py
  • Admin route: routes/aggregation_trigger_routes.py
  • Backfill CLI: scripts/backfill_order_aggregates.py
  • Reporting: methods/reporting_aggregates.py
  • Routes: routes/reporting_routes.py