Calabi Transform Models

Professional+

A model in Calabi Transform is a SQL SELECT statement saved as a .sql file. When you run Calabi Transform, each model is compiled and executed against your data warehouse, materialising the result as a table, view, or other structure. Models are the core building block of the transformation layer — they define how raw data becomes analytics-ready.

Model Anatomy

Every model is a plain SQL file that contains a single SELECT statement. There is no CREATE TABLE or INSERT INTO — Calabi Transform handles the DDL automatically based on the configured materialisation.

-- models/staging/stg_orders.sql

select
    id                                    as order_id,
    customer_id,
    status,
    amount / 100.0                        as amount_dollars,
    cast(created_at as timestamp)         as created_at,
    _etl_loaded_at
from {{ source('jaffle_shop', 'orders') }}
where status != 'deleted'

Key observations:

The model references a raw source table using the {{ source() }} function.
Column transformations (type casting, renaming, normalisation) are applied inline.
No DDL is written — Calabi Transform infers this from the materialisation setting.

The `ref()` Function

The ref() function is the primary way models reference other models. Using ref() instead of hardcoded table names gives Calabi Transform three important capabilities:

Dependency resolution — Calabi Transform builds a DAG of model dependencies and executes them in the correct order.
Cross-environment portability — ref() resolves to the correct schema (dev, staging, production) based on your target configuration.
Lineage tracking — Calabi Catalogue automatically visualises the upstream/downstream relationships between models.

-- models/intermediate/int_orders_with_customers.sql

select
    o.order_id,
    o.amount_dollars,
    o.created_at,
    c.customer_name,
    c.email,
    c.country
from {{ ref('stg_orders') }} o
left join {{ ref('stg_customers') }} c
    on o.customer_id = c.customer_id

Here, int_orders_with_customers depends on both stg_orders and stg_customers. Calabi Transform will build those two models before building this one.

Model Layers (Naming Conventions)

A well-structured Calabi Transform project organises models into layers by their purpose. The recommended three-layer architecture is:

models/
├── staging/          # One-to-one with raw source tables; light cleaning only
│   ├── stg_orders.sql
│   └── stg_customers.sql
├── intermediate/     # Business logic that joins or reshapes staging models
│   ├── int_orders_with_customers.sql
│   └── int_daily_revenue.sql
└── marts/            # Wide, analytics-ready tables consumed by BI tools
    ├── fct_orders.sql
    └── dim_customers.sql

Layer	Prefix	Purpose	Consumers
Staging	`stg_`	Select and lightly clean one source table. No joins.	Intermediate models
Intermediate	`int_`	Apply business logic: joins, aggregations, pivots.	Mart models
Marts	`fct_`, `dim_`	Final wide tables and fact/dimension tables for reporting.	BI tools, analysts

Why layer your models?

Layering makes the project navigable and testable. A bug in a staging model affects all downstream consumers — you fix it once. Intermediate models isolate complex logic. Marts are simple and fast to query.

Materialisations

The materialisation controls how Calabi Transform persists a model's results in the warehouse. Set it in the model's config block or in dbt_project.yml.

View (default)

{{ config(materialized='view') }}

select * from {{ ref('stg_orders') }}

Calabi Transform creates (or replaces) a database view.
No data is stored — the query runs at read time.
Zero storage cost; slow for complex queries called frequently.
Best for: staging models, simple transformations, development.

Table

{{ config(materialized='table') }}

select
    date_trunc('month', created_at) as month,
    sum(amount_dollars)             as revenue
from {{ ref('int_orders_with_customers') }}
group by 1

Calabi Transform drops and recreates a physical table on every run.
Fast to query; consumes warehouse storage.
Best for: mart models queried by BI tools, aggregations, reporting tables.

Incremental

{{ config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='sync_all_columns'
) }}

select
    order_id,
    customer_id,
    amount_dollars,
    created_at
from {{ ref('stg_orders') }}

{% if is_incremental() %}
    -- Only process rows newer than the last run
    where created_at > (select max(created_at) from {{ this }})
{% endif %}

On the first run, behaves like a full table build.
On subsequent runs, only processes new/changed rows and merges (upserts) them into the existing table using unique_key.
Best for: large event tables, append-heavy tables, tables where a full rebuild would be prohibitively slow.
Requires a reliable cursor column (created_at, updated_at) that identifies new rows.

Parameter	Description
`unique_key`	Column(s) used to match new rows to existing rows for upsert
`on_schema_change`	What to do when columns are added/removed: `ignore`, `fail`, `append_new_columns`, `sync_all_columns`

Ephemeral

{{ config(materialized='ephemeral') }}

select
    order_id,
    amount_dollars * 1.1 as amount_with_tax
from {{ ref('stg_orders') }}

Not persisted to the warehouse at all — compiled as a Common Table Expression (CTE) inline within the models that reference it.
Zero storage, zero warehouse objects; increases complexity of compiled SQL.
Best for: small intermediate transformations that are used in exactly one downstream model.

Materialisation Comparison

Materialisation	Storage	Query Speed	Rebuild Time	Best For
`view`	None	Slowest	Instant	Staging, development
`table`	Full	Fastest	Slow (full rebuild)	Marts, BI-facing
`incremental`	Partial delta	Fast	Fast (incremental)	Large fact tables
`ephemeral`	None	Inline with parent	Instant	Single-use CTEs

Model Configuration

Configuration can be set at three levels (most specific wins):

1. In the model SQL file (config block)

{{ config(
    materialized='incremental',
    schema='marts',
    tags=['daily', 'finance'],
    unique_key='order_id',
    post_hook="GRANT SELECT ON {{ this }} TO ROLE reporter"
) }}

select ...

2. In `dbt_project.yml` (project-wide defaults)

models:
  my_project:
    staging:
      +materialized: view
      +schema: staging
    intermediate:
      +materialized: view
      +schema: intermediate
    marts:
      +materialized: table
      +schema: marts
      +tags: ["marts"]

3. In `schema.yml` files (documentation + tests, not materialisation)

models:
  - name: fct_orders
    description: "One row per order, fully enriched with customer data."
    columns:
      - name: order_id
        description: "Primary key — unique identifier for each order."
        data_tests:
          - unique
          - not_null

Useful Config Options

Config Key	Values	Description
`materialized`	`view`, `table`, `incremental`, `ephemeral`	How to persist the model
`schema`	String	Override the schema for this model
`database`	String	Override the database (multi-database projects)
`alias`	String	Override the table/view name in the warehouse
`tags`	List	Labels for selecting model subsets (`dbt run --select tag:finance`)
`enabled`	`true` / `false`	Disable a model without deleting it
`pre_hook`	SQL string	SQL to run before the model
`post_hook`	SQL string	SQL to run after the model (e.g., grants, stats)
`unique_key`	Column(s)	For incremental models — the merge key
`partition_by`	Dict	BigQuery/Snowflake partitioning config
`cluster_by`	List	BigQuery/Snowflake clustering config

Full Working Example

A complete staging → intermediate → mart pipeline:

-- models/staging/stg_payments.sql
{{ config(materialized='view') }}

select
    id                          as payment_id,
    order_id,
    payment_method,
    amount / 100.0              as amount_dollars,
    status,
    cast(created_at as timestamp) as created_at
from {{ source('stripe', 'payments') }}

-- models/intermediate/int_order_payments.sql
{{ config(materialized='view') }}

select
    order_id,
    sum(amount_dollars)                                             as total_paid,
    max(case when payment_method = 'bank_transfer' then 1 end) = 1 as paid_by_bank,
    count(*)                                                        as payment_count
from {{ ref('stg_payments') }}
where status = 'success'
group by order_id

-- models/marts/fct_orders.sql
{{ config(materialized='table', tags=['mart', 'finance']) }}

select
    o.order_id,
    o.customer_id,
    o.created_at,
    o.amount_dollars                            as order_amount,
    coalesce(p.total_paid, 0)                   as total_paid,
    o.amount_dollars - coalesce(p.total_paid, 0) as outstanding_balance,
    p.payment_count,
    p.paid_by_bank,
    c.customer_name,
    c.country
from {{ ref('stg_orders') }} o
left join {{ ref('int_order_payments') }} p on o.order_id = p.order_id
left join {{ ref('stg_customers') }} c       on o.customer_id = c.customer_id

Next Steps

Sources & Raw Data — Define and test your raw source tables
Data Tests — Validate model output quality
Running Transformations — Execute models and check results

Model Anatomy​

The ref() Function​

Model Layers (Naming Conventions)​

Materialisations​

View (default)​

Table​

Incremental​

Ephemeral​

Materialisation Comparison​

Model Configuration​

1. In the model SQL file (config block)​

2. In dbt_project.yml (project-wide defaults)​

3. In schema.yml files (documentation + tests, not materialisation)​

Useful Config Options​

Full Working Example​

Next Steps​