dbt-performance
Optimizing dbt and Snowflake performance through materialization choices, clustering keys, warehouse sizing, and query optimization. Use this skill when addressing slow model builds, optimizing query performance, sizing warehouses, implementing clustering strategies, or troubleshooting performance issues.
About dbt-performance
dbt-performance is a Claude AI skill developed by sfc-gh-dflippo. Optimizing dbt and Snowflake performance through materialization choices, clustering keys, warehouse sizing, and query optimization. Use this skill when addressing slow model builds, optimizing query performance, sizing warehouses, implementing clustering strategies, or troubleshooting performance issues. This powerful Claude Code plugin helps developers automate workflows and enhance productivity with intelligent AI assistance.
Why use dbt-performance? With 0 stars on GitHub, this skill has been trusted by developers worldwide. Install this Claude skill instantly to enhance your development workflow with AI-powered automation.
| name | dbt-performance |
| description | Optimizing dbt and Snowflake performance through materialization choices, clustering keys, warehouse sizing, and query optimization. Use this skill when addressing slow model builds, optimizing query performance, sizing warehouses, implementing clustering strategies, or troubleshooting performance issues. |
dbt Performance Optimization
Purpose
Transform AI agents into experts on dbt and Snowflake performance optimization, providing guidance on choosing optimal materializations, leveraging Snowflake-specific features, and implementing query optimization patterns for production-grade performance.
When to Use This Skill
Activate this skill when users ask about:
- Optimizing slow dbt model builds
- Choosing appropriate materializations for performance
- Implementing Snowflake clustering keys
- Sizing warehouses appropriately
- Converting models to incremental for performance
- Optimizing query patterns and SQL
- Troubleshooting performance bottlenecks
- Using Snowflake performance features (Gen2, query acceleration, search optimization)
Official Snowflake Documentation: Query Performance
Materialization Performance
Choose the Right Materialization
| Materialization | Build Time | Query Time | Best For |
|---|---|---|---|
| ephemeral | Fast | Varies | Staging, reusable logic |
| view | Instant | Slow | Always-fresh simple transforms |
| table | Slow | Fast | Dimensions, complex logic |
| incremental | Fast | Fast | Large facts (millions+ rows) |
Guidelines:
- Use
ephemeralfor staging (fast, no storage) - Use
tablefor dimensions - Use
incrementalfor large facts
When to Change Materializations
Change Ephemeral/View to Table When:
1. Memory Constraints
Queries failing or running slowly due to memory limitations:
-- Change from ephemeral to table {{ config(materialized='table') }}
2. CTE Reuse
Same intermediate model referenced multiple times downstream:
-- If int_customers__metrics is used by 3+ downstream models {{ config(materialized='table') }} -- Materialize to avoid re-computation
3. Functions on Join Columns
Transformations in JOIN conditions prevent optimization:
-- ❌ BAD: Functions on join columns (forces full table scans) select * from {{ ref('stg_customers') }} c join {{ ref('stg_orders') }} o on upper(trim(c.customer_email)) = upper(trim(o.customer_email)) -- ✅ GOOD: Materialize cleaned columns as a table first {{ config(materialized='table') }} select customer_id, upper(trim(customer_email)) as customer_email_clean -- Pre-compute once from {{ ref('stg_customers') }}
Performance Impact: Ephemeral → Table trades storage for compute efficiency
Change Table to Incremental When:
1. Large Data Volumes
Table has millions+ rows and full refreshes take too long:
{{ config( materialized='incremental', unique_key='order_id', cluster_by=['order_date'] ) }} select * from {{ ref('stg_orders') }} {% if is_incremental() %} where order_date > (select max(order_date) from {{ this }}) {% endif %}
2. Append-Only Data
Event logs, clickstreams, transaction history
3. Time-Based Updates
Daily/hourly data loads with time-based filtering
Performance Impact: Table → Incremental reduces build time at cost of added complexity
Snowflake-Specific Optimizations
Clustering Keys
Improve query performance on large tables:
{{ config( materialized='table', cluster_by=['order_date', 'customer_id'] ) }}
Best Practices:
- Use 1-4 columns maximum
- Order columns by cardinality (low to high)
- Include common WHERE clause columns
- Include JOIN key columns
- Monitor cluster usage:
SYSTEM$CLUSTERING_INFORMATION()
Example with Multiple Keys:
{{ config( materialized='incremental', unique_key='event_id', cluster_by=['event_date', 'event_type', 'user_id'] ) }}
Official Snowflake Docs: Clustering Keys
Warehouse Sizing
{{ config( snowflake_warehouse='LARGE_WH' -- For complex transformations ) }}
Optimal Sizing Goal: ~500 Micropartitions (MPs) per Node
Snowflake stores data in micropartitions (~16MB compressed). The warehouse sizing goal is to maintain approximately 500 MPs scanned per node for optimal performance. Too few MPs per node underutilizes the warehouse; too many causes compute skew and spilling.
Sizing Formula:
Warehouse Size Needed = Total MPs Scanned / 500
Quick Reference Table (MPs scanned → Recommended warehouse):
| MPs Scanned | Warehouse Size | Nodes | MPs per Node |
|---|---|---|---|
| 500 | XS | 1 | 500 |
| 1,000 | S | 2 | 500 |
| 2,000 | M | 4 | 500 |
| 4,000 | L | 8 | 500 |
| 8,000 | XL | 16 | 500 |
| 16,000 | 2XL | 32 | 500 |
| 32,000 | 3XL | 64 | 500 |
| 64,000 | 4XL | 128 | 500 |
How to Find MPs Scanned:
-- Check query profile after running model SELECT query_id, total_elapsed_time, partitions_scanned FROM snowflake.account_usage.query_history WHERE start_time >= dateadd(day, -1, current_timestamp()) and query_text ILIKE '%your_model_name%' ORDER BY start_time DESC LIMIT 1;
Practical Guidelines:
- Under-sized: If MPs per node > 1000, consider larger warehouse
- Over-sized: If MPs per node < 250, consider smaller warehouse
- Development: Start with XS-S, profile, then adjust
- Production: Size based on actual MP scan metrics from query history
Official Snowflake Docs: Warehouse Considerations
Generation 2 Standard Warehouses
Gen2 standard warehouses offer improved performance for most dbt workloads.
Why Gen2 is Better for dbt Projects:
- Faster transformations: Enhanced DELETE, UPDATE, MERGE, and table scan operations (critical for incremental models and snapshots)
- Delta Micropartitions: Snowflake does not rewrite entire micropartitions for changed data
- Faster Underlying Hardware: Majority of queries finish faster, can do more work simultaneously
- Analytics optimization: Purpose-built for data engineering and analytics workloads
Converting to Gen2:
-- Run directly in Snowflake ALTER WAREHOUSE TRANSFORMING_WH SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;
Official Snowflake Docs: Gen2 Standard Warehouses
Search Optimization Service
For point lookups and selective filters on large tables:
{{ config( post_hook=[ "alter table {{ this }} add search optimization on equality(customer_id, email)" ] ) }}
When to Use:
- Point lookups (WHERE customer_id = ?)
- Selective filters on large tables
- High-cardinality columns
Official Snowflake Docs: Search Optimization
Query Acceleration Service
Query Acceleration is configured at the warehouse level, not in dbt models:
-- Run directly in Snowflake ALTER WAREHOUSE TRANSFORMING_WH SET ENABLE_QUERY_ACCELERATION = TRUE; -- Set scale factor (optional) ALTER WAREHOUSE TRANSFORMING_WH SET QUERY_ACCELERATION_MAX_SCALE_FACTOR = 8;
When to Use:
- Queries with unpredictable data volume
- Ad-hoc analytics workloads
- Queries that scan large portions of tables
- Variable query complexity
Official Snowflake Docs: Query Acceleration
Result Caching
Snowflake automatically caches query results for 24 hours.
Best Practices:
- Use consistent query patterns to leverage cache
- Avoid unnecessary
current_timestamp()in WHERE clauses (breaks cache) - Identical queries return cached results instantly
Example to Preserve Cache:
-- ❌ Breaks cache every run where created_at > current_timestamp() - interval '7 days' -- ✅ Preserves cache (use dbt variables) where created_at > '{{ var("lookback_date") }}'
Incremental Model Performance
Efficient WHERE Clauses
{% if is_incremental() %} -- ✅ Good: Partition pruning where order_date >= (select max(order_date) from {{ this }}) -- ✅ Good: With lookback for late data where order_date >= dateadd(day, -3, (select max(order_date) from {{ this }})) -- ✅ Good: Limit source scan where order_date >= dateadd(day, -30, current_date()) and order_date > (select max(order_date) from {{ this }}) {% endif %}
Incremental Strategy Performance
| Strategy | Speed | Use Case |
|---|---|---|
| append | Fastest | Immutable event data |
| merge | Medium | Updateable records |
| delete+insert | Fast | Partitioned data |
Choose based on data characteristics:
- Append: Event logs, clickstreams (never update)
- Merge: Orders, customers (updates possible)
- Delete+Insert: Date-partitioned aggregations
Query Optimization Tips
Filter Before Joining
-- ✅ Good: Filter before joining with filtered_orders as ( select * from {{ ref('stg_orders') }} where order_date >= '2024-01-01' ) select c.customer_id, count(o.order_id) as order_count from {{ ref('dim_customers') }} c join filtered_orders o on c.customer_id = o.customer_id group by c.customer_id
Why It Works: Reduces join size, improves memory efficiency
Use QUALIFY for Window Functions
-- ✅ Good: Snowflake QUALIFY clause (cleaner & faster) select customer_id, order_date, order_amount, row_number() over (partition by customer_id order by order_date desc) as rn from {{ ref('stg_orders') }} qualify rn <= 5 -- Top 5 orders per customer -- ❌ Slower: Subquery approach select * from ( select customer_id, order_date, row_number() over (partition by customer_id order by order_date desc) as rn from {{ ref('stg_orders') }} ) where rn <= 5
Why QUALIFY is Better: Single scan, no subquery overhead
Pre-Aggregate Before Joining
-- ✅ Good: Aggregate first, then join with order_metrics as ( select customer_id, count(*) as order_count, sum(order_total) as lifetime_value from {{ ref('stg_orders') }} group by customer_id ) select c.*, coalesce(m.order_count, 0) as order_count, coalesce(m.lifetime_value, 0) as lifetime_value from {{ ref('dim_customers') }} c left join order_metrics m on c.customer_id = m.customer_id
Why It Works: Reduces join size dramatically, avoids repeated aggregation
Avoid SELECT *
-- ❌ Bad: Reads all columns select * from {{ ref('fct_orders') }} where order_date = current_date() -- ✅ Good: Select only needed columns select order_id, customer_id, order_date, order_amount from {{ ref('fct_orders') }} where order_date = current_date()
Why It Matters: Column pruning reduces data scanned and network transfer
Development vs Production
Limit Data in Development
Create macro for dev data limiting:
-- macros/limit_data_in_dev.sql {% macro limit_data_in_dev(column_name, dev_days_of_data=3) %} {% if target.name == 'dev' %} where {{ column_name }} >= dateadd(day, -{{ dev_days_of_data }}, current_date()) {% endif %} {% endmacro %}
Usage:
select * from {{ ref('stg_orders') }} {{ limit_data_in_dev('order_date', 7) }}
Target-Specific Configuration
# dbt_project.yml models: your_project: gold: # Use views in dev, tables in prod +materialized: "{{ 'view' if target.name == 'dev' else 'table' }}"
Benefits:
- Faster dev builds
- Lower dev costs
- Prod stays optimized
Query Profiling
Snowflake Query Profile
View in Snowflake UI:
- Go to Query History
- Select your query
- Click "Query Profile" tab
- Analyze execution plan
Query history with performance metrics:
select query_id, query_text, execution_time, bytes_scanned, warehouse_name, partitions_scanned from snowflake.account_usage.query_history where user_name = current_user() and start_time >= dateadd(day, -7, current_date()) order by execution_time desc limit 100;
dbt Timing Information
# Run with timing details dbt run --select model_name --log-level debug # View run timing cat target/run_results.json | jq '.results[].execution_time'
Analyze slow models:
# Find slowest models cat target/run_results.json | jq -r '.results[] | [.execution_time, .unique_id] | @tsv' | sort -rn | head -10
Performance Checklist
Model-Level Optimization
- Appropriate materialization chosen (ephemeral/table/incremental)
- Clustering keys applied for large tables (1-4 columns)
- Incremental strategy optimized (append/merge/delete+insert)
- WHERE clauses filter early (before joins)
- JOINs are necessary and optimized (filter before join)
- SELECT only needed columns (no SELECT *)
- Window functions use QUALIFY when possible
Project-Level Optimization
- Staging models are ephemeral (no storage overhead)
- Large facts are incremental (faster builds)
- Dev environment uses limited data (faster iteration)
- Warehouse sizing appropriate per model complexity
- Gen2 warehouses enabled for transformations
- Regular performance reviews scheduled
- Clustering monitored and maintained
Helping Users with Performance
Strategy for Assisting Users
When users report performance issues:
- Identify the bottleneck: Build time? Query time? Both?
- Check materialization: Is it appropriate for model size/purpose?
- Review query patterns: Are there obvious inefficiencies?
- Assess warehouse sizing: Using appropriate compute for workload?
- Recommend optimizations: Specific, actionable improvements
- Provide examples: Working code with performance comparisons
Common User Questions
"My model is slow to build"
- Check materialization: Should it be incremental?
- Review warehouse size: Appropriate for data volume?
- Analyze query: Are there inefficient patterns?
- Check clustering: Would it help query performance?
"How do I make this faster?"
- Change ephemeral to table if reused multiple times
- Convert table to incremental for large datasets
- Add clustering keys for frequently filtered columns
- Pre-aggregate before joining
- Use QUALIFY instead of subqueries
"What warehouse size should I use?"
- Profile the query to see MPs scanned
- Aim for ~500 MPs per warehouse node
- Start small, scale up based on actual metrics
- Use
snowflake_warehouseconfig for model-specific sizing
Related Official Documentation
- Snowflake Docs: Query Performance
- Snowflake Docs: Clustering
- Snowflake Docs: Gen2 Warehouses
- dbt Docs: Best Practices
Goal: Transform AI agents into expert dbt performance optimizers who identify bottlenecks, recommend appropriate optimizations, and implement Snowflake-specific features for production-grade performance.

sfc-gh-dflippo
snowflake-dbt-demo
Download Skill Files
View Installation GuideDownload the complete skill directory including SKILL.md and all related files