Data Science 2025-01-18 5 min read

dbt in Production: Patterns That Scale From Startup to Enterprise

Moving dbt from local development to production requires deliberate architectural choices. Learn the patterns that let you grow without rebuilding your data stack.

Your dbt project works perfectly on your laptop. Models run in seconds, tests pass, and everything feels clean. Then you deploy to production and reality hits: long-running transformations, cascading failures, team members stepping on each other's work, and zero visibility into what's actually happening.

The jump from startup to enterprise dbt is less about discovering new tools and more about implementing the right patterns early. The teams that scale smoothly typically start with practices that seem excessive for 10 models but become essential at 100. This post covers the patterns we've seen work at LavaPi across organizations at different scales.

Modular Project Structure

Startups often organize dbt by database schema. That works until you have 200 models and no clear ownership. The better approach is organizing by business domain or team, even if you only have a few models.

code
models/
├── marts/
│   ├── finance/
│   │   ├── _finance__models.yml
│   │   ├── revenue.sql
│   │   └── expenses.sql
│   ├── operations/
│   │   ├── _operations__models.yml
│   │   ├── efficiency.sql
│   │   └── capacity.sql
├── staging/
│   ├── stripe/
│   ├── hubspot/
│   └── postgres/
└── intermediate/
    └── finance_base.sql

This structure makes it obvious which team owns which models and keeps dependencies visible. When someone changes a transformation, you immediately know who to notify.

Naming Conventions Matter

Consistent naming prevents mental overhead and catches mistakes at review time. Use prefixes that describe the model's purpose:

yaml
# dbt_project.yml
models:
  my_project:
    materialized: table
    staging:
      materialized: view
    marts:
      materialized: table
    intermediate:
      materialized: view

Prefix with

code

stg_

for staging,

code

int_

for intermediate, and use domain names in marts (

code

fct_

for facts,

code

dim_

for dimensions). This convention makes it trivial to understand a model's role without reading the SQL.

Test-Driven Transformations

Tests are your safety net in production. Not the occasional dbt test, but comprehensive coverage of business logic.

yaml
version: 2
models:
  - name: revenue
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: revenue_amount
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "revenue_amount > 0"
    tests:
      - dbt_utils.recency:
          datepart: day
          interval: 1
          field_name: created_at

Tests catch silent failures before stakeholders notice missing revenue in dashboards. Start with unique and not_null constraints, then add business logic tests. The recency test above alerts you if data stops arriving within 24 hours—critical for downstream operations.

Scaling Execution and Observability

Local development and production have different needs. In production, you need visibility into what's happening across hundreds of models.

Selective Execution

Run the full DAG nightly but use incremental models and selective refreshes during the day:

bash
# Full refresh nightly
dbt run --full-refresh

# Incremental updates during business hours
dbt run --select state:modified+

# Specific team's models
dbt run --select tag:finance

Observability

Dbt Cloud provides built-in observability, but whatever platform you use, track:

Model execution time trends
Test failure rates
Data freshness across marts
Row counts for anomaly detection

Set alerts on execution time spikes and test failures. When a model suddenly takes 10x longer to run, you want to know before users report empty dashboards.

Environment Separation

Don't test in production. Maintain separate development, staging, and production databases with appropriate dbt environments:

yaml
# profiles.yml
my_project:
  target: dev
  outputs:
    dev:
      type: postgres
      schema: analytics_dev
      threads: 4
    prod:
      type: postgres
      schema: analytics
      threads: 8

Developers work in analytics_dev with full freedom to break things. Pull requests trigger runs against staging. Only merged code runs against production.

The Real Bottleneck

At startup scale, adding another 50 models feels easy. The bottleneck at enterprise scale isn't compute—it's clarity. Teams need to understand which models they own, what they depend on, and when they're failing.

Start building this clarity now. Organize by domain, test aggressively, and invest in observability. The patterns that seem overengineered for 20 models become your foundation for scaling to 500.

ShareX LinkedIn Facebook

LavaPi Team

Digital Engineering Company

All articles