Skip to main content

Calabi Transform Models

Professional+

A model in Calabi Transform is a SQL SELECT statement saved as a .sql file. When you run Calabi Transform, each model is compiled and executed against your data warehouse, materialising the result as a table, view, or other structure. Models are the core building block of the transformation layer — they define how raw data becomes analytics-ready.


Model Anatomy

Every model is a plain SQL file that contains a single SELECT statement. There is no CREATE TABLE or INSERT INTO — Calabi Transform handles the DDL automatically based on the configured materialisation.

-- models/staging/stg_orders.sql

select
id as order_id,
customer_id,
status,
amount / 100.0 as amount_dollars,
cast(created_at as timestamp) as created_at,
_etl_loaded_at
from {{ source('jaffle_shop', 'orders') }}
where status != 'deleted'

Key observations:

  • The model references a raw source table using the {{ source() }} function.
  • Column transformations (type casting, renaming, normalisation) are applied inline.
  • No DDL is written — Calabi Transform infers this from the materialisation setting.

The ref() Function

The ref() function is the primary way models reference other models. Using ref() instead of hardcoded table names gives Calabi Transform three important capabilities:

  1. Dependency resolution — Calabi Transform builds a DAG of model dependencies and executes them in the correct order.
  2. Cross-environment portabilityref() resolves to the correct schema (dev, staging, production) based on your target configuration.
  3. Lineage tracking — Calabi Catalogue automatically visualises the upstream/downstream relationships between models.
-- models/intermediate/int_orders_with_customers.sql

select
o.order_id,
o.amount_dollars,
o.created_at,
c.customer_name,
c.email,
c.country
from {{ ref('stg_orders') }} o
left join {{ ref('stg_customers') }} c
on o.customer_id = c.customer_id

Here, int_orders_with_customers depends on both stg_orders and stg_customers. Calabi Transform will build those two models before building this one.


Model Layers (Naming Conventions)

A well-structured Calabi Transform project organises models into layers by their purpose. The recommended three-layer architecture is:

models/
├── staging/ # One-to-one with raw source tables; light cleaning only
│ ├── stg_orders.sql
│ └── stg_customers.sql
├── intermediate/ # Business logic that joins or reshapes staging models
│ ├── int_orders_with_customers.sql
│ └── int_daily_revenue.sql
└── marts/ # Wide, analytics-ready tables consumed by BI tools
├── fct_orders.sql
└── dim_customers.sql
LayerPrefixPurposeConsumers
Stagingstg_Select and lightly clean one source table. No joins.Intermediate models
Intermediateint_Apply business logic: joins, aggregations, pivots.Mart models
Martsfct_, dim_Final wide tables and fact/dimension tables for reporting.BI tools, analysts
Why layer your models?

Layering makes the project navigable and testable. A bug in a staging model affects all downstream consumers — you fix it once. Intermediate models isolate complex logic. Marts are simple and fast to query.


Materialisations

The materialisation controls how Calabi Transform persists a model's results in the warehouse. Set it in the model's config block or in dbt_project.yml.

View (default)

{{ config(materialized='view') }}

select * from {{ ref('stg_orders') }}
  • Calabi Transform creates (or replaces) a database view.
  • No data is stored — the query runs at read time.
  • Zero storage cost; slow for complex queries called frequently.
  • Best for: staging models, simple transformations, development.

Table

{{ config(materialized='table') }}

select
date_trunc('month', created_at) as month,
sum(amount_dollars) as revenue
from {{ ref('int_orders_with_customers') }}
group by 1
  • Calabi Transform drops and recreates a physical table on every run.
  • Fast to query; consumes warehouse storage.
  • Best for: mart models queried by BI tools, aggregations, reporting tables.

Incremental

{{ config(
materialized='incremental',
unique_key='order_id',
on_schema_change='sync_all_columns'
) }}

select
order_id,
customer_id,
amount_dollars,
created_at
from {{ ref('stg_orders') }}

{% if is_incremental() %}
-- Only process rows newer than the last run
where created_at > (select max(created_at) from {{ this }})
{% endif %}
  • On the first run, behaves like a full table build.
  • On subsequent runs, only processes new/changed rows and merges (upserts) them into the existing table using unique_key.
  • Best for: large event tables, append-heavy tables, tables where a full rebuild would be prohibitively slow.
  • Requires a reliable cursor column (created_at, updated_at) that identifies new rows.
ParameterDescription
unique_keyColumn(s) used to match new rows to existing rows for upsert
on_schema_changeWhat to do when columns are added/removed: ignore, fail, append_new_columns, sync_all_columns

Ephemeral

{{ config(materialized='ephemeral') }}

select
order_id,
amount_dollars * 1.1 as amount_with_tax
from {{ ref('stg_orders') }}
  • Not persisted to the warehouse at all — compiled as a Common Table Expression (CTE) inline within the models that reference it.
  • Zero storage, zero warehouse objects; increases complexity of compiled SQL.
  • Best for: small intermediate transformations that are used in exactly one downstream model.

Materialisation Comparison

MaterialisationStorageQuery SpeedRebuild TimeBest For
viewNoneSlowestInstantStaging, development
tableFullFastestSlow (full rebuild)Marts, BI-facing
incrementalPartial deltaFastFast (incremental)Large fact tables
ephemeralNoneInline with parentInstantSingle-use CTEs

Model Configuration

Configuration can be set at three levels (most specific wins):

1. In the model SQL file (config block)

{{ config(
materialized='incremental',
schema='marts',
tags=['daily', 'finance'],
unique_key='order_id',
post_hook="GRANT SELECT ON {{ this }} TO ROLE reporter"
) }}

select ...

2. In dbt_project.yml (project-wide defaults)

models:
my_project:
staging:
+materialized: view
+schema: staging
intermediate:
+materialized: view
+schema: intermediate
marts:
+materialized: table
+schema: marts
+tags: ["marts"]

3. In schema.yml files (documentation + tests, not materialisation)

models:
- name: fct_orders
description: "One row per order, fully enriched with customer data."
columns:
- name: order_id
description: "Primary key — unique identifier for each order."
data_tests:
- unique
- not_null

Useful Config Options

Config KeyValuesDescription
materializedview, table, incremental, ephemeralHow to persist the model
schemaStringOverride the schema for this model
databaseStringOverride the database (multi-database projects)
aliasStringOverride the table/view name in the warehouse
tagsListLabels for selecting model subsets (dbt run --select tag:finance)
enabledtrue / falseDisable a model without deleting it
pre_hookSQL stringSQL to run before the model
post_hookSQL stringSQL to run after the model (e.g., grants, stats)
unique_keyColumn(s)For incremental models — the merge key
partition_byDictBigQuery/Snowflake partitioning config
cluster_byListBigQuery/Snowflake clustering config

Full Working Example

A complete staging → intermediate → mart pipeline:

-- models/staging/stg_payments.sql
{{ config(materialized='view') }}

select
id as payment_id,
order_id,
payment_method,
amount / 100.0 as amount_dollars,
status,
cast(created_at as timestamp) as created_at
from {{ source('stripe', 'payments') }}
-- models/intermediate/int_order_payments.sql
{{ config(materialized='view') }}

select
order_id,
sum(amount_dollars) as total_paid,
max(case when payment_method = 'bank_transfer' then 1 end) = 1 as paid_by_bank,
count(*) as payment_count
from {{ ref('stg_payments') }}
where status = 'success'
group by order_id
-- models/marts/fct_orders.sql
{{ config(materialized='table', tags=['mart', 'finance']) }}

select
o.order_id,
o.customer_id,
o.created_at,
o.amount_dollars as order_amount,
coalesce(p.total_paid, 0) as total_paid,
o.amount_dollars - coalesce(p.total_paid, 0) as outstanding_balance,
p.payment_count,
p.paid_by_bank,
c.customer_name,
c.country
from {{ ref('stg_orders') }} o
left join {{ ref('int_order_payments') }} p on o.order_id = p.order_id
left join {{ ref('stg_customers') }} c on o.customer_id = c.customer_id

Next Steps