dbt-materializations
Choosing and implementing dbt materializations (ephemeral, view, table, incremental, snapshots, Python models). Use this skill when deciding on materialization strategy, implementing incremental models, setting up snapshots for SCD Type 2 tracking, or creating Python models for machine learning workloads.
About dbt-materializations
dbt-materializations is a Claude AI skill developed by sfc-gh-dflippo. Choosing and implementing dbt materializations (ephemeral, view, table, incremental, snapshots, Python models). Use this skill when deciding on materialization strategy, implementing incremental models, setting up snapshots for SCD Type 2 tracking, or creating Python models for machine learning workloads. This powerful Claude Code plugin helps developers automate workflows and enhance productivity with intelligent AI assistance.
Why use dbt-materializations? With 0 stars on GitHub, this skill has been trusted by developers worldwide. Install this Claude skill instantly to enhance your development workflow with AI-powered automation.
| name | dbt-materializations |
| description | Choosing and implementing dbt materializations (ephemeral, view, table, incremental, snapshots, Python models). Use this skill when deciding on materialization strategy, implementing incremental models, setting up snapshots for SCD Type 2 tracking, or creating Python models for machine learning workloads. |
dbt Materializations
Purpose
Transform AI agents into experts on dbt materializations, providing guidance on choosing the right materialization strategy based on model purpose, size, update frequency, and query patterns, plus implementation details for each type including advanced features like snapshots and Python models.
When to Use This Skill
Activate this skill when users ask about:
- Choosing the right materialization for a model
- Implementing incremental models with merge/append/delete+insert strategies
- Setting up snapshots for SCD Type 2 historical tracking
- Converting table materializations to incremental
- Creating Python models for ML or advanced analytics
- Understanding trade-offs between ephemeral, view, and table
- Optimizing materialization performance
- Implementing slowly changing dimensions
Official dbt Documentation: Materializations
Decision Matrix
| Materialization | Use Case | Build Time | Storage | Query Speed | Best For |
|---|---|---|---|---|---|
| ephemeral | Staging, reusable logic | Fast (CTE) | None | N/A | Bronze layer |
| view | Simple transforms | Fast | Minimal | Slow | Always-fresh data |
| table | Complex logic | Slow | High | Fast | Dimensions |
| incremental | Large datasets | Fast | Medium | Fast | Large facts |
Ephemeral Materialization
When to Use: Staging models, reusable intermediate logic that doesn't need to be queried directly
{{ config(materialized='ephemeral') }} select customer_id, customer_name, upper(trim(email)) as email_clean from {{ source('crm', 'customers') }}
How it Works:
- Compiled as CTE in downstream models
- No physical table created
- Zero storage cost
- Cannot be queried directly
Best For:
- Bronze/staging layer models
- Reusable intermediate transformations
- Models referenced by only 1-2 downstream models
Performance Note: If an ephemeral model is referenced by many downstream models or contains
complex logic, consider changing to table materialization to avoid recomputing.
View Materialization
When to Use: Simple transformations where you always need fresh data and query performance isn't critical
{{ config(materialized='view') }} select customer_id, count(*) as order_count, sum(order_amount) as total_spent from {{ ref('stg_orders') }} group by customer_id
How it Works:
- Creates database view
- Query runs every time view is accessed
- Minimal storage (just view definition)
- Always shows latest data
Best For:
- Simple aggregations
- Always-fresh reporting needs
- Development/prototyping
When to Avoid:
- Complex transformations
- Frequently queried models
- Performance-critical queries
Table Materialization
When to Use: Dimensions, complex transformations, frequently queried models
{{ config( materialized='table', cluster_by=['order_date'] -- Snowflake optimization ) }} select customer_id, customer_name, count(distinct order_id) as lifetime_orders, sum(order_amount) as lifetime_value, min(order_date) as first_order_date from {{ ref('stg_customers') }} c join {{ ref('stg_orders') }} o using (customer_id) group by customer_id, customer_name
How it Works:
- Drops and recreates table on every run (DROP + CREATE TABLE AS)
- Full refresh every time
- Fast query performance
- Higher storage cost
Best For:
- Dimension tables
- Gold layer marts
- Complex silver layer transformations
- Models with frequent queries
Performance Optimization:
{{ config( materialized='table', cluster_by=['date_column', 'category'], -- Snowflake clustering snowflake_warehouse='LARGE_WH' -- Custom warehouse for complex logic ) }}
Incremental Materialization
When to Use: Large fact tables (millions+ rows), time-series data, event logs, append-only data
{{ config( materialized='incremental', unique_key='order_id', incremental_strategy='merge', merge_exclude_columns=['dbt_inserted_at'], cluster_by=['order_date', 'customer_id'] ) }} select order_id, customer_id, order_date, order_amount, order_status, {% if is_incremental() %} dbt_inserted_at, -- Preserve from first insert {% else %} current_timestamp() as dbt_inserted_at, {% endif %} current_timestamp() as dbt_updated_at from {{ ref('stg_orders') }} {% if is_incremental() %} -- Only process new/updated records where order_date > (select max(order_date) from {{ this }}) {% endif %}
How it Works:
- First run: Loads all data (acts like table)
- Subsequent runs: Insert/update only new records
- Uses
unique_keyto identify records - Dramatically faster builds for large tables
Performance Benefits:
- Reduces build time from hours to minutes
- Lower compute costs
- Enables more frequent refreshes
Incremental Strategies
1. Merge Strategy (Most Common)
Use For: Updateable records, handling late-arriving data, SCD Type 1
{{ config( materialized='incremental', unique_key='order_id', incremental_strategy='merge', merge_exclude_columns=['dbt_inserted_at'] -- Preserve original timestamp ) }} select order_id, customer_id, order_status, -- Can change over time order_amount, {% if is_incremental() %} dbt_inserted_at, {% else %} current_timestamp() as dbt_inserted_at, {% endif %} current_timestamp() as dbt_updated_at from {{ ref('stg_orders') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}
How It Works:
- Matches on
unique_key - Updates existing records
- Inserts new records
- Preserves columns in
merge_exclude_columns
2. Append Strategy (Fastest)
Use For: Immutable event data, logs, clickstreams
{{ config( materialized='incremental', unique_key='event_id', incremental_strategy='append' ) }} select event_id, user_id, event_type, event_timestamp, event_properties from {{ ref('stg_events') }} {% if is_incremental() %} where event_timestamp > (select max(event_timestamp) from {{ this }}) {% endif %}
How It Works:
- Only inserts new records
- No updates or deletes
- Fastest incremental strategy
- Assumes data is immutable
3. Delete+Insert Strategy
Use For: Partitioned data, date-based reprocessing
{{ config( materialized='incremental', unique_key='order_date', incremental_strategy='delete+insert' ) }} select order_date, customer_id, count(*) as daily_orders, sum(order_amount) as daily_revenue from {{ ref('stg_orders') }} group by order_date, customer_id {% if is_incremental() %} where order_date >= dateadd(day, -7, current_date()) -- Reprocess last 7 days {% endif %}
How It Works:
- Deletes records matching
unique_keyvalues - Inserts all new records
- Good for reprocessing entire partitions
Official dbt Docs: Incremental Models
Incremental Best Practices
1. Always Include is_incremental() Check
{% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}
2. Add Lookback for Late Data
{% if is_incremental() %} where order_date >= dateadd(day, -3, (select max(order_date) from {{ this }})) {% endif %}
3. Limit Source Scans
{% if is_incremental() %} -- Only scan recent source data where source_updated_at >= dateadd(day, -30, current_date()) and source_updated_at > (select max(updated_at) from {{ this }}) {% endif %}
4. Use Clustering for Performance
{{ config( cluster_by=['event_date', 'user_id'] -- Commonly filtered/joined columns ) }}
5. Handle Full Refresh
# Force rebuild from scratch dbt build --full-refresh --select model_name
Snapshots (SCD Type 2)
Purpose: Track historical changes to slowly changing dimensions
When to Use: Customer attributes, product catalogs, employee records, pricing history
-- snapshots/dim_customers_scd.sql {% snapshot dim_customers_scd %} {{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at', invalidate_hard_deletes=True ) }} select * from {{ ref('stg_customers') }} {% endsnapshot %}
Generated Columns:
dbt_valid_from- When record became activedbt_valid_to- When record was superseded (NULL for current)dbt_scd_id- Unique identifier for each versiondbt_updated_at- Last snapshot processing time
Snapshot Strategies
Timestamp Strategy (Recommended)
Use When: Source has reliable updated_at timestamp
{% snapshot customers_snapshot_timestamp %} {{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at' ) }} select * from {{ source('crm', 'customers') }} {% endsnapshot %}
Advantages:
- Faster performance
- Only checks timestamp
- More efficient
Check Strategy
Use When: No reliable timestamp, need to check specific columns
{% snapshot customers_snapshot_check %} {{ config( target_schema='snapshots', unique_key='customer_id', strategy='check', check_cols=['customer_name', 'customer_email', 'customer_tier'] ) }} select * from {{ source('crm', 'customers') }} {% endsnapshot %}
Advantages:
- Works without timestamp
- Tracks specific column changes
- More control over what triggers new version
Querying Snapshots
Get Current Records Only:
select * from {{ ref('dim_customers_scd') }} where dbt_valid_to is null
Point-in-Time Query:
select * from {{ ref('dim_customers_scd') }} where '2024-01-15' between dbt_valid_from and coalesce(dbt_valid_to, '9999-12-31')
Change History:
select customer_id, customer_name, dbt_valid_from, dbt_valid_to from {{ ref('dim_customers_scd') }} order by customer_id, dbt_valid_from
Running Snapshots:
dbt snapshot # Runs all snapshots dbt snapshot --select dim_customers_scd # Specific snapshot
Official dbt Docs: Snapshots
Python Models
Purpose: Machine learning, statistical analysis, complex transformations beyond SQL
When to Use: ML models, clustering, advanced analytics, Python library integration (pandas, scikit-learn, etc.)
# models/silver/customer_clustering.py def model(dbt, session): """Cluster customers using K-Means""" dbt.config( materialized="table", packages=["scikit-learn", "pandas"] ) import pandas as pd from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Get data from dbt model df = dbt.ref("int_customers__metrics").to_pandas() # Select features features = ['total_orders', 'lifetime_value', 'avg_order_value'] X = df[features].fillna(0) # Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Perform clustering kmeans = KMeans(n_clusters=5, random_state=42) df['cluster_id'] = kmeans.fit_predict(X_scaled) df['cluster_label'] = df['cluster_id'].map({ 0: 'Low Value', 1: 'Medium Value', 2: 'High Value', 3: 'VIP', 4: 'At Risk' }) # Return final dataframe return df[['customer_id', 'cluster_id', 'cluster_label'] + features]
Python Model Best Practices
1. Use SQL for Data Preparation
# Let SQL handle filtering, joins, aggregations df = dbt.ref("int_customers__prepared").to_pandas()
2. Python for ML/Complex Analytics Only
# Don't use Python for simple transformations # Use SQL instead
3. Specify Required Packages
dbt.config( packages=["scikit-learn==1.3.0", "pandas", "numpy"] )
4. Test Python Models
# Can use standard dbt tests models: - name: customer_clustering columns: - name: customer_id tests: - dbt_constraints.primary_key - name: cluster_id tests: - not_null - accepted_values: values: [0, 1, 2, 3, 4]
Official dbt Docs: Python Models
When to Change Materializations
Change Ephemeral/View to Table When:
-
Model is Referenced Multiple Times
- If 3+ downstream models reference it
- Avoids recomputing same logic
-
Complex Transformations
- Heavy aggregations or window functions
- Self-joins or complex CTEs
-
Memory Issues
- Queries failing due to memory constraints
- CTE becoming too large
Change Table to Incremental When:
-
Large Data Volumes
- Table has millions+ rows
- Full refresh takes > 5 minutes
-
Time-Series Data
- Append-only event logs
- Daily/hourly data loads
-
Performance Requirements
- Need faster, more frequent refreshes
- Cost optimization needed
Folder-Level Configuration
Configure materializations at folder level in dbt_project.yml:
models: your_project: bronze: +materialized: ephemeral +tags: ["bronze", "staging"] silver: +materialized: ephemeral +tags: ["silver"] gold: +materialized: table +tags: ["gold", "marts"]
Override at model level only for special cases (incremental, Python, etc.).
Helping Users Choose Materializations
Strategy for Assisting Users
When users ask about materializations:
- Understand purpose: What is the model for?
- Assess size: How many rows? How often updated?
- Check usage: How many downstream models? Query frequency?
- Recommend materialization: Based on layer and requirements
- Provide configuration: Complete working example
- Explain trade-offs: Build time, storage, query performance
Common User Questions
"Should this be ephemeral or table?"
- Ephemeral: Staging, lightweight intermediate models
- Table: Dimensions, marts, complex intermediate models
- Consider: Reusability, complexity, downstream usage
"When should I use incremental?"
- Large tables (millions+ rows)
- Time-series or append-only data
- Performance/cost optimization needed
- Full rebuild takes too long
"How do I set up SCD Type 2?"
- Use snapshots with timestamp or check strategy
- Configure unique_key and updated_at
- Query with dbt_valid_to IS NULL for current records
Related Official Documentation
Goal: Transform AI agents into experts on dbt materializations who guide users to optimal materialization choices based on data characteristics, usage patterns, and performance requirements.

sfc-gh-dflippo
snowflake-dbt-demo
Download Skill Files
View Installation GuideDownload the complete skill directory including SKILL.md and all related files