Data Tests
Calabi Transform's data testing framework lets you assert expectations about your data and get alerted when those expectations are violated. Tests run against the warehouse after models are built, catching data quality issues before they reach dashboards or downstream consumers. Test failures surface in the run log, in Calabi Catalogue's data quality panel, and in any configured alert channels.
Test Types Overview
| Type | Where It Lives | What It Does |
|---|---|---|
| Generic | .yml files alongside models/sources | Reusable parameterised test applied to columns |
| Singular | tests/ directory as .sql files | One-off SQL assertion custom to your data |
| Custom generic | tests/generic/ as Jinja macros | Build your own reusable generic test |
Generic Tests
Calabi Transform ships with four built-in generic tests that cover the most common data quality requirements.
not_null
Asserts that a column contains no NULL values.
models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- not_null
- name: customer_id
data_tests:
- not_null
unique
Asserts that all values in a column are distinct.
models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- unique
accepted_values
Asserts that a column contains only values from a predefined list.
models:
- name: stg_orders
columns:
- name: status
data_tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
Set quote: false if comparing against numeric or boolean values:
- accepted_values:
values: [1, 2, 3]
quote: false
relationships
Asserts referential integrity — every value in the column exists in a related table.
models:
- name: fct_orders
columns:
- name: customer_id
data_tests:
- relationships:
to: ref('dim_customers')
field: customer_id
To test against a source table:
- relationships:
to: source('jaffle_shop', 'customers')
field: id
Applying Multiple Tests to One Column
You can stack as many tests as needed on a single column:
models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- not_null
- unique
- name: status
data_tests:
- not_null
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
- name: customer_id
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: amount_dollars
data_tests:
- not_null
Singular Tests
Singular tests are custom SQL files in the tests/ directory. They are flexible enough to express any assertion your business logic requires. A singular test passes if it returns zero rows — returning rows means the assertion was violated.
Example: No negative revenue
-- tests/assert_orders_have_positive_revenue.sql
select
order_id,
amount_dollars
from {{ ref('fct_orders') }}
where amount_dollars < 0
Example: Payment amounts must not exceed order amounts
-- tests/assert_payments_not_greater_than_orders.sql
select
o.order_id,
o.amount_dollars as order_amount,
p.total_paid
from {{ ref('fct_orders') }} o
join {{ ref('int_order_payments') }} p using (order_id)
where p.total_paid > o.amount_dollars * 1.01 -- allow 1% rounding tolerance
Example: Every order must have a corresponding customer
-- tests/assert_all_orders_have_customers.sql
select o.order_id
from {{ ref('stg_orders') }} o
left join {{ ref('stg_customers') }} c on o.customer_id = c.customer_id
where c.customer_id is null
Test Severity
By default, a test failure causes the test run to exit with an error code (severity: error). You can configure individual tests to emit a warning instead — useful for tests that catch data quality drift without blocking the pipeline.
Setting severity in YAML
models:
- name: fct_orders
columns:
- name: status
data_tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
config:
severity: warn # Warn but don't fail the run
Threshold-based severity (error_if / warn_if)
Instead of failing on any violation, set thresholds:
- not_null:
config:
error_if: ">100" # Error if more than 100 rows fail
warn_if: ">10" # Warn if more than 10 rows fail
This is useful when a small number of null values is expected (e.g., optional fields on legacy records) but a large number indicates a systemic problem.
Storing Test Failures
Use the --store-failures flag to write failing test rows to the warehouse. This makes debugging much faster — instead of re-running the model and test, you can query the failure table directly.
dbt test --store-failures
Failure tables are written to a schema named dbt_test__audit (configurable) and named after the test. For example, the unique test on fct_orders.order_id writes failures to:
dbt_test__audit.unique_fct_orders_order_id
Persisting failures per model
models:
- name: fct_orders
config:
store_failures: true
columns:
- name: order_id
data_tests:
- unique
- not_null
Custom Generic Tests
If you need a reusable test that doesn't exist in Calabi Transform's built-ins, write a custom generic test using Jinja macros.
Example: test that a numeric column is within a range
Create the file at tests/generic/test_between.sql:
-- tests/generic/test_between.sql
{% test between(model, column_name, min_value, max_value) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} < {{ min_value }}
or {{ column_name }} > {{ max_value }}
{% endtest %}
Apply it in YAML like any built-in test:
models:
- name: fct_orders
columns:
- name: discount_pct
data_tests:
- between:
min_value: 0
max_value: 100
Example: test that a date column is not in the future
-- tests/generic/test_not_in_future.sql
{% test not_in_future(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} > current_timestamp
{% endtest %}
- name: created_at
data_tests:
- not_in_future
Running Tests
# Run all tests in the project
dbt test
# Test a specific model
dbt test --select fct_orders
# Test a model and all its upstream dependencies
dbt test --select fct_orders+
# Test only sources
dbt test --select source:*
# Test a specific source
dbt test --select source:jaffle_shop
# Run tests with a tag
dbt test --select tag:critical
# Run tests and store failures
dbt test --store-failures
# Run tests in parallel (default threads from profiles.yml)
dbt test --threads 8
Test Results in Calabi Catalogue
Every test run result is published to Calabi Catalogue, where you can:
- View current test pass/fail status per model column
- See historical test results and trend charts (is data quality improving or degrading?)
- Filter tables by data quality score
- Set up notifications when a test transitions from passing to failing
Navigate to Calabi Catalogue → [Table Name] → Data Quality to see results for a specific model.
Apply a critical tag to tests that should block downstream use of a model. This allows you to filter by criticality in Calabi Catalogue and configure stricter alerting.
- not_null:
config:
tags: ['critical']
Testing Best Practices
| Practice | Recommendation |
|---|---|
| Primary keys | Apply unique + not_null to every primary key column |
| Foreign keys | Apply relationships to every foreign key column |
| Enumerated fields | Apply accepted_values to status, type, and category columns |
| Amounts | Apply a custom between test or singular test for reasonable ranges |
| Critical models | Tag key tests as critical and configure alerts in Calabi Catalogue |
| Source tests | Apply not_null and unique to source primary keys — catch issues at the raw layer |
| Severity | Use warn for soft rules, error for hard data contracts |
Next Steps
- Running Transformations — Execute your models and tests via Calabi Pipelines or CLI
- Sources & Raw Data — Apply tests to raw source tables
- Models — Understand model structure and materialisation