Skip to main content

Data Tests

Professional+

Calabi Transform's data testing framework lets you assert expectations about your data and get alerted when those expectations are violated. Tests run against the warehouse after models are built, catching data quality issues before they reach dashboards or downstream consumers. Test failures surface in the run log, in Calabi Catalogue's data quality panel, and in any configured alert channels.


Test Types Overview

TypeWhere It LivesWhat It Does
Generic.yml files alongside models/sourcesReusable parameterised test applied to columns
Singulartests/ directory as .sql filesOne-off SQL assertion custom to your data
Custom generictests/generic/ as Jinja macrosBuild your own reusable generic test

Generic Tests

Calabi Transform ships with four built-in generic tests that cover the most common data quality requirements.

not_null

Asserts that a column contains no NULL values.

models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- not_null
- name: customer_id
data_tests:
- not_null

unique

Asserts that all values in a column are distinct.

models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- unique

accepted_values

Asserts that a column contains only values from a predefined list.

models:
- name: stg_orders
columns:
- name: status
data_tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']

Set quote: false if comparing against numeric or boolean values:

- accepted_values:
values: [1, 2, 3]
quote: false

relationships

Asserts referential integrity — every value in the column exists in a related table.

models:
- name: fct_orders
columns:
- name: customer_id
data_tests:
- relationships:
to: ref('dim_customers')
field: customer_id

To test against a source table:

- relationships:
to: source('jaffle_shop', 'customers')
field: id

Applying Multiple Tests to One Column

You can stack as many tests as needed on a single column:

models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- not_null
- unique
- name: status
data_tests:
- not_null
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
- name: customer_id
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: amount_dollars
data_tests:
- not_null

Singular Tests

Singular tests are custom SQL files in the tests/ directory. They are flexible enough to express any assertion your business logic requires. A singular test passes if it returns zero rows — returning rows means the assertion was violated.

Example: No negative revenue

-- tests/assert_orders_have_positive_revenue.sql

select
order_id,
amount_dollars
from {{ ref('fct_orders') }}
where amount_dollars < 0

Example: Payment amounts must not exceed order amounts

-- tests/assert_payments_not_greater_than_orders.sql

select
o.order_id,
o.amount_dollars as order_amount,
p.total_paid
from {{ ref('fct_orders') }} o
join {{ ref('int_order_payments') }} p using (order_id)
where p.total_paid > o.amount_dollars * 1.01 -- allow 1% rounding tolerance

Example: Every order must have a corresponding customer

-- tests/assert_all_orders_have_customers.sql

select o.order_id
from {{ ref('stg_orders') }} o
left join {{ ref('stg_customers') }} c on o.customer_id = c.customer_id
where c.customer_id is null

Test Severity

By default, a test failure causes the test run to exit with an error code (severity: error). You can configure individual tests to emit a warning instead — useful for tests that catch data quality drift without blocking the pipeline.

Setting severity in YAML

models:
- name: fct_orders
columns:
- name: status
data_tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
config:
severity: warn # Warn but don't fail the run

Threshold-based severity (error_if / warn_if)

Instead of failing on any violation, set thresholds:

- not_null:
config:
error_if: ">100" # Error if more than 100 rows fail
warn_if: ">10" # Warn if more than 10 rows fail

This is useful when a small number of null values is expected (e.g., optional fields on legacy records) but a large number indicates a systemic problem.


Storing Test Failures

Use the --store-failures flag to write failing test rows to the warehouse. This makes debugging much faster — instead of re-running the model and test, you can query the failure table directly.

dbt test --store-failures

Failure tables are written to a schema named dbt_test__audit (configurable) and named after the test. For example, the unique test on fct_orders.order_id writes failures to:

dbt_test__audit.unique_fct_orders_order_id

Persisting failures per model

models:
- name: fct_orders
config:
store_failures: true
columns:
- name: order_id
data_tests:
- unique
- not_null

Custom Generic Tests

If you need a reusable test that doesn't exist in Calabi Transform's built-ins, write a custom generic test using Jinja macros.

Example: test that a numeric column is within a range

Create the file at tests/generic/test_between.sql:

-- tests/generic/test_between.sql

{% test between(model, column_name, min_value, max_value) %}

select {{ column_name }}
from {{ model }}
where {{ column_name }} < {{ min_value }}
or {{ column_name }} > {{ max_value }}

{% endtest %}

Apply it in YAML like any built-in test:

models:
- name: fct_orders
columns:
- name: discount_pct
data_tests:
- between:
min_value: 0
max_value: 100

Example: test that a date column is not in the future

-- tests/generic/test_not_in_future.sql

{% test not_in_future(model, column_name) %}

select {{ column_name }}
from {{ model }}
where {{ column_name }} > current_timestamp

{% endtest %}
- name: created_at
data_tests:
- not_in_future

Running Tests

# Run all tests in the project
dbt test

# Test a specific model
dbt test --select fct_orders

# Test a model and all its upstream dependencies
dbt test --select fct_orders+

# Test only sources
dbt test --select source:*

# Test a specific source
dbt test --select source:jaffle_shop

# Run tests with a tag
dbt test --select tag:critical

# Run tests and store failures
dbt test --store-failures

# Run tests in parallel (default threads from profiles.yml)
dbt test --threads 8

Test Results in Calabi Catalogue

Every test run result is published to Calabi Catalogue, where you can:

  • View current test pass/fail status per model column
  • See historical test results and trend charts (is data quality improving or degrading?)
  • Filter tables by data quality score
  • Set up notifications when a test transitions from passing to failing

Navigate to Calabi Catalogue → [Table Name] → Data Quality to see results for a specific model.

Tag critical tests

Apply a critical tag to tests that should block downstream use of a model. This allows you to filter by criticality in Calabi Catalogue and configure stricter alerting.

- not_null:
config:
tags: ['critical']

Testing Best Practices

PracticeRecommendation
Primary keysApply unique + not_null to every primary key column
Foreign keysApply relationships to every foreign key column
Enumerated fieldsApply accepted_values to status, type, and category columns
AmountsApply a custom between test or singular test for reasonable ranges
Critical modelsTag key tests as critical and configure alerts in Calabi Catalogue
Source testsApply not_null and unique to source primary keys — catch issues at the raw layer
SeverityUse warn for soft rules, error for hard data contracts

Next Steps