dbt in Production: Patterns That Scale From Startup to Enterprise
Moving dbt from local development to production requires deliberate architectural choices. Learn the patterns that let you grow without rebuilding your data stack.
Your dbt project works perfectly on your laptop. Models run in seconds, tests pass, and everything feels clean. Then you deploy to production and reality hits: long-running transformations, cascading failures, team members stepping on each other's work, and zero visibility into what's actually happening.
The jump from startup to enterprise dbt is less about discovering new tools and more about implementing the right patterns early. The teams that scale smoothly typically start with practices that seem excessive for 10 models but become essential at 100. This post covers the patterns we've seen work at LavaPi across organizations at different scales.
Modular Project Structure
Startups often organize dbt by database schema. That works until you have 200 models and no clear ownership. The better approach is organizing by business domain or team, even if you only have a few models.
codemodels/ ├── marts/ │ ├── finance/ │ │ ├── _finance__models.yml │ │ ├── revenue.sql │ │ └── expenses.sql │ ├── operations/ │ │ ├── _operations__models.yml │ │ ├── efficiency.sql │ │ └── capacity.sql ├── staging/ │ ├── stripe/ │ ├── hubspot/ │ └── postgres/ └── intermediate/ └── finance_base.sql
This structure makes it obvious which team owns which models and keeps dependencies visible. When someone changes a transformation, you immediately know who to notify.
Naming Conventions Matter
Consistent naming prevents mental overhead and catches mistakes at review time. Use prefixes that describe the model's purpose:
yaml# dbt_project.yml models: my_project: materialized: table staging: materialized: view marts: materialized: table intermediate: materialized: view
Prefix with
stg_int_fct_dim_Test-Driven Transformations
Tests are your safety net in production. Not the occasional dbt test, but comprehensive coverage of business logic.
yamlversion: 2 models: - name: revenue columns: - name: order_id tests: - unique - not_null - name: revenue_amount tests: - not_null - dbt_utils.expression_is_true: expression: "revenue_amount > 0" tests: - dbt_utils.recency: datepart: day interval: 1 field_name: created_at
Tests catch silent failures before stakeholders notice missing revenue in dashboards. Start with unique and not_null constraints, then add business logic tests. The recency test above alerts you if data stops arriving within 24 hours—critical for downstream operations.
Scaling Execution and Observability
Local development and production have different needs. In production, you need visibility into what's happening across hundreds of models.
Selective Execution
Run the full DAG nightly but use incremental models and selective refreshes during the day:
bash# Full refresh nightly dbt run --full-refresh # Incremental updates during business hours dbt run --select state:modified+ # Specific team's models dbt run --select tag:finance
Observability
Dbt Cloud provides built-in observability, but whatever platform you use, track:
- Model execution time trends
- Test failure rates
- Data freshness across marts
- Row counts for anomaly detection
Set alerts on execution time spikes and test failures. When a model suddenly takes 10x longer to run, you want to know before users report empty dashboards.
Environment Separation
Don't test in production. Maintain separate development, staging, and production databases with appropriate dbt environments:
yaml# profiles.yml my_project: target: dev outputs: dev: type: postgres schema: analytics_dev threads: 4 prod: type: postgres schema: analytics threads: 8
Developers work in analytics_dev with full freedom to break things. Pull requests trigger runs against staging. Only merged code runs against production.
The Real Bottleneck
At startup scale, adding another 50 models feels easy. The bottleneck at enterprise scale isn't compute—it's clarity. Teams need to understand which models they own, what they depend on, and when they're failing.
Start building this clarity now. Organize by domain, test aggressively, and invest in observability. The patterns that seem overengineered for 20 models become your foundation for scaling to 500.
LavaPi Team
Digital Engineering Company