Calabi Transform Models
A model in Calabi Transform is a SQL SELECT statement saved as a .sql file. When you run Calabi Transform, each model is compiled and executed against your data warehouse, materialising the result as a table, view, or other structure. Models are the core building block of the transformation layer — they define how raw data becomes analytics-ready.
Model Anatomy
Every model is a plain SQL file that contains a single SELECT statement. There is no CREATE TABLE or INSERT INTO — Calabi Transform handles the DDL automatically based on the configured materialisation.
-- models/staging/stg_orders.sql
select
id as order_id,
customer_id,
status,
amount / 100.0 as amount_dollars,
cast(created_at as timestamp) as created_at,
_etl_loaded_at
from {{ source('jaffle_shop', 'orders') }}
where status != 'deleted'
Key observations:
- The model references a raw source table using the
{{ source() }}function. - Column transformations (type casting, renaming, normalisation) are applied inline.
- No DDL is written — Calabi Transform infers this from the materialisation setting.
The ref() Function
The ref() function is the primary way models reference other models. Using ref() instead of hardcoded table names gives Calabi Transform three important capabilities:
- Dependency resolution — Calabi Transform builds a DAG of model dependencies and executes them in the correct order.
- Cross-environment portability —
ref()resolves to the correct schema (dev, staging, production) based on your target configuration. - Lineage tracking — Calabi Catalogue automatically visualises the upstream/downstream relationships between models.
-- models/intermediate/int_orders_with_customers.sql
select
o.order_id,
o.amount_dollars,
o.created_at,
c.customer_name,
c.email,
c.country
from {{ ref('stg_orders') }} o
left join {{ ref('stg_customers') }} c
on o.customer_id = c.customer_id
Here, int_orders_with_customers depends on both stg_orders and stg_customers. Calabi Transform will build those two models before building this one.
Model Layers (Naming Conventions)
A well-structured Calabi Transform project organises models into layers by their purpose. The recommended three-layer architecture is:
models/
├── staging/ # One-to-one with raw source tables; light cleaning only
│ ├── stg_orders.sql
│ └── stg_customers.sql
├── intermediate/ # Business logic that joins or reshapes staging models
│ ├── int_orders_with_customers.sql
│ └── int_daily_revenue.sql
└── marts/ # Wide, analytics-ready tables consumed by BI tools
├── fct_orders.sql
└── dim_customers.sql
| Layer | Prefix | Purpose | Consumers |
|---|---|---|---|
| Staging | stg_ | Select and lightly clean one source table. No joins. | Intermediate models |
| Intermediate | int_ | Apply business logic: joins, aggregations, pivots. | Mart models |
| Marts | fct_, dim_ | Final wide tables and fact/dimension tables for reporting. | BI tools, analysts |
Layering makes the project navigable and testable. A bug in a staging model affects all downstream consumers — you fix it once. Intermediate models isolate complex logic. Marts are simple and fast to query.
Materialisations
The materialisation controls how Calabi Transform persists a model's results in the warehouse. Set it in the model's config block or in dbt_project.yml.
View (default)
{{ config(materialized='view') }}
select * from {{ ref('stg_orders') }}
- Calabi Transform creates (or replaces) a database view.
- No data is stored — the query runs at read time.
- Zero storage cost; slow for complex queries called frequently.
- Best for: staging models, simple transformations, development.
Table
{{ config(materialized='table') }}
select
date_trunc('month', created_at) as month,
sum(amount_dollars) as revenue
from {{ ref('int_orders_with_customers') }}
group by 1
- Calabi Transform drops and recreates a physical table on every run.
- Fast to query; consumes warehouse storage.
- Best for: mart models queried by BI tools, aggregations, reporting tables.
Incremental
{{ config(
materialized='incremental',
unique_key='order_id',
on_schema_change='sync_all_columns'
) }}
select
order_id,
customer_id,
amount_dollars,
created_at
from {{ ref('stg_orders') }}
{% if is_incremental() %}
-- Only process rows newer than the last run
where created_at > (select max(created_at) from {{ this }})
{% endif %}
- On the first run, behaves like a full table build.
- On subsequent runs, only processes new/changed rows and merges (upserts) them into the existing table using
unique_key. - Best for: large event tables, append-heavy tables, tables where a full rebuild would be prohibitively slow.
- Requires a reliable cursor column (
created_at,updated_at) that identifies new rows.
| Parameter | Description |
|---|---|
unique_key | Column(s) used to match new rows to existing rows for upsert |
on_schema_change | What to do when columns are added/removed: ignore, fail, append_new_columns, sync_all_columns |
Ephemeral
{{ config(materialized='ephemeral') }}
select
order_id,
amount_dollars * 1.1 as amount_with_tax
from {{ ref('stg_orders') }}
- Not persisted to the warehouse at all — compiled as a Common Table Expression (CTE) inline within the models that reference it.
- Zero storage, zero warehouse objects; increases complexity of compiled SQL.
- Best for: small intermediate transformations that are used in exactly one downstream model.
Materialisation Comparison
| Materialisation | Storage | Query Speed | Rebuild Time | Best For |
|---|---|---|---|---|
view | None | Slowest | Instant | Staging, development |
table | Full | Fastest | Slow (full rebuild) | Marts, BI-facing |
incremental | Partial delta | Fast | Fast (incremental) | Large fact tables |
ephemeral | None | Inline with parent | Instant | Single-use CTEs |
Model Configuration
Configuration can be set at three levels (most specific wins):
1. In the model SQL file (config block)
{{ config(
materialized='incremental',
schema='marts',
tags=['daily', 'finance'],
unique_key='order_id',
post_hook="GRANT SELECT ON {{ this }} TO ROLE reporter"
) }}
select ...
2. In dbt_project.yml (project-wide defaults)
models:
my_project:
staging:
+materialized: view
+schema: staging
intermediate:
+materialized: view
+schema: intermediate
marts:
+materialized: table
+schema: marts
+tags: ["marts"]
3. In schema.yml files (documentation + tests, not materialisation)
models:
- name: fct_orders
description: "One row per order, fully enriched with customer data."
columns:
- name: order_id
description: "Primary key — unique identifier for each order."
data_tests:
- unique
- not_null
Useful Config Options
| Config Key | Values | Description |
|---|---|---|
materialized | view, table, incremental, ephemeral | How to persist the model |
schema | String | Override the schema for this model |
database | String | Override the database (multi-database projects) |
alias | String | Override the table/view name in the warehouse |
tags | List | Labels for selecting model subsets (dbt run --select tag:finance) |
enabled | true / false | Disable a model without deleting it |
pre_hook | SQL string | SQL to run before the model |
post_hook | SQL string | SQL to run after the model (e.g., grants, stats) |
unique_key | Column(s) | For incremental models — the merge key |
partition_by | Dict | BigQuery/Snowflake partitioning config |
cluster_by | List | BigQuery/Snowflake clustering config |
Full Working Example
A complete staging → intermediate → mart pipeline:
-- models/staging/stg_payments.sql
{{ config(materialized='view') }}
select
id as payment_id,
order_id,
payment_method,
amount / 100.0 as amount_dollars,
status,
cast(created_at as timestamp) as created_at
from {{ source('stripe', 'payments') }}
-- models/intermediate/int_order_payments.sql
{{ config(materialized='view') }}
select
order_id,
sum(amount_dollars) as total_paid,
max(case when payment_method = 'bank_transfer' then 1 end) = 1 as paid_by_bank,
count(*) as payment_count
from {{ ref('stg_payments') }}
where status = 'success'
group by order_id
-- models/marts/fct_orders.sql
{{ config(materialized='table', tags=['mart', 'finance']) }}
select
o.order_id,
o.customer_id,
o.created_at,
o.amount_dollars as order_amount,
coalesce(p.total_paid, 0) as total_paid,
o.amount_dollars - coalesce(p.total_paid, 0) as outstanding_balance,
p.payment_count,
p.paid_by_bank,
c.customer_name,
c.country
from {{ ref('stg_orders') }} o
left join {{ ref('int_order_payments') }} p on o.order_id = p.order_id
left join {{ ref('stg_customers') }} c on o.customer_id = c.customer_id
Next Steps
- Sources & Raw Data — Define and test your raw source tables
- Data Tests — Validate model output quality
- Running Transformations — Execute models and check results