Data Quality Tests

Professional+

Data Quality Tests in Calabi Catalogue give you automated, continuously-run assertions over your data. Tests run on a schedule, results are stored historically, and failures trigger alerts — so data problems are caught before they reach dashboards and reports.

Test Architecture

Test Types

Test type	What it checks	Example
Not Null	Column has no NULL values	`candidate_id IS NOT NULL`
Unique	All values in a column are distinct	`attempt_id` is unique
Not In Set	Values do not appear in a forbidden list	`status NOT IN ('voided', 'test')`
In Set	Values are restricted to an allowed list	`region IN ('EMEA', 'AMER', 'APAC')`
Minimum	All values meet a lower bound	`score >= 0`
Maximum	All values meet an upper bound	`score <= 100`
Between (Range)	Values fall within a range	`pass_threshold BETWEEN 50 AND 90`
Min Row Count	Table has at least N rows	At least 1,000 rows in `scores`
Max Row Count	Table has at most N rows	At most 100M rows in `raw_events`
Row Count Between	Row count falls in a range	`scores` between 500K and 50M rows
Column Values to Be of Type	Column data type matches expectation	`exam_date` is a timestamp
Custom SQL	Arbitrary SQL assertion	See examples below

Creating a Test Suite

A Test Suite is a named collection of tests for one or more tables. Tests must belong to a suite before they can be scheduled.

Step 1 — Create the suite

Navigate to Data Quality → Test Suites → + New Test Suite.
Enter a Name (e.g., Exam Scores — Core Quality Checks).
Add a Description and Owners.
Click Save.

Step 2 — Add tables to the suite

Open the suite.
Click + Add table.
Search for and select the tables you want to test.

Step 3 — Add test cases

Inside the suite, select a table.
Click + Add test case.
Configure the test (see below).
Click Save.

Configuring Individual Tests

Not Null test

Field	Value
Test name	`candidate_id_not_null`
Column	`candidate_id`
Test type	`Column Values To Not Be Null`
Fail threshold	`0` (zero nulls allowed)

Unique test

Field	Value
Test name	`attempt_id_unique`
Column	`attempt_id`
Test type	`Column Values To Be Unique`

Range test

Field	Value
Test name	`score_range_0_to_100`
Column	`score`
Test type	`Column Values To Be Between`
Min value	`0`
Max value	`100`

Row count test

Field	Value
Test name	`scores_min_row_count`
Test type	`Table Row Count To Be Between`
Min value	`500000`
Max value	— (leave blank for no upper limit)

Custom SQL Tests

Custom SQL tests let you write any assertion that evaluates to a single numeric value. The test passes when the result equals zero (no failures found).

-- Test: no future exam dates
-- Pass condition: COUNT = 0 (no rows with exam_date in the future)
SELECT COUNT(*)
FROM psychometric_reporting.scores
WHERE exam_date > CURRENT_TIMESTAMP;

-- Test: pass rate is not implausibly high (> 99%)
-- Pass condition: result = 0 (rate is within plausible range)
SELECT CASE
    WHEN SUM(CASE WHEN passed THEN 1 ELSE 0 END) * 1.0 / COUNT(*) > 0.99
    THEN 1
    ELSE 0
END
FROM psychometric_reporting.scores
WHERE exam_date >= CURRENT_DATE - INTERVAL '1 day';

-- Test: no orphaned candidate_id references
SELECT COUNT(*)
FROM psychometric_reporting.scores s
WHERE NOT EXISTS (
    SELECT 1 FROM dim.candidates c WHERE c.candidate_id = s.candidate_id
);

-- Test: no duplicate (candidate_id, exam_id, exam_date) combinations
SELECT COUNT(*) - COUNT(DISTINCT candidate_id || '|' || exam_id || '|' || exam_date::text)
FROM psychometric_reporting.scores;

Custom SQL test — configuration

Field	Value
Test name	`no_future_exam_dates`
Test type	`Custom SQL`
SQL expression	Paste your SQL (must return a single number)
Pass condition	`EqualTo`
Pass value	`0`

Scheduling Tests

Open the test suite.
Click Schedule Pipeline → + Add ingestion.
Configure the schedule:

Field	Description	Example
Name	Pipeline name	`scores-quality-daily`
Schedule	Cron expression	`0 6 * * *` (daily at 6 AM)
Timezone	Cron evaluation timezone	`UTC`
Retry count	Retries on failure	`2`

Click Deploy.

The test pipeline appears in Settings → Ingestion Pipelines and can be monitored, paused, or run manually from there.

Recommended schedules

Data freshness	Recommended test schedule
Real-time / near-real-time	Every 15–60 minutes
Daily ETL	30–60 minutes after the ETL completes
Weekly batch	Weekly, 2 hours after batch window
Archived / reference data	Monthly

Running Tests Manually

Navigate to the asset page → Quality tab.
Click Run Tests → Run all tests (or select specific tests).
Results appear within seconds to minutes depending on table size.

Or from the test suite:

Data Quality → Test Suites → open the suite.
Click Run Now (top-right).

Viewing Test Results

On the asset page

Open any table's asset page.
Click the Quality tab.

The Quality tab shows:

A summary badge: Pass / Fail / Warning count.
A trend graph of historical pass rates.
Per-test results with status, last run time, and failure details.

Test result status codes

Status	Meaning
Success	The assertion passed with zero failures
Failed	The assertion returned a non-zero count
Aborted	The test run was cancelled or timed out
Queued	The test is scheduled but not yet started
Running	Currently executing

Viewing failure details

Click on a Failed test to see:

The number of rows that violated the assertion.
A sample of up to 10 failing values (for column-level tests).
The SQL generated for the test (expandable).
The full error message if the test threw an exception.

Alerting on Test Failures

Configure alerts so the right team is notified immediately when quality degrades.

Navigate to Settings → Alerts & Reports (Admin) or Governance → Quality Alerts.
Click + New Alert.
Configure:

Field	Description	Example
Name	Alert name	`Scores Table — Critical Failure Alert`
Entity	Table or test suite to watch	`psychometric_reporting.scores`
Trigger	Condition to alert on	`Any test failure`
Recipients	Email addresses	`data-engineering@yourcompany.com`
Slack channel	Optional Slack destination	`#data-alerts`
Webhook URL	Optional HTTP endpoint	PagerDuty / OpsGenie integration URL

Click Save.

Alert channels

Channel	Setup
Email	Configure SMTP under Settings → Email; add recipient email addresses
Slack	Configure Slack integration; specify channel name with `#`
Webhook	POST a JSON payload to any HTTP endpoint (PagerDuty, OpsGenie, Teams)

dbt Test Integration

If your Calabi Transform (dbt) project includes tests, Calabi Catalogue automatically ingests dbt test results as part of the dbt metadata ingestion pipeline.

dbt test results appear on the same Quality tab as native Catalogue tests, clearly labelled with their source:

Test	Source
`not_null:candidate_id`	dbt
`unique:attempt_id`	dbt
`accepted_values:region`	dbt
`no_future_exam_dates`	Catalogue Custom SQL

This gives you a single quality dashboard per table regardless of whether tests were defined in dbt or Calabi Catalogue.

Test Result History

The Catalogue retains full result history for every test. Historical results power:

Trend charts on the Quality tab — see pass rates rise or fall over time.
SLO tracking — compute uptime and reliability metrics for critical tables.
Incident investigation — identify exactly when a failure first appeared.

History is retained for 90 days on Professional tier. Contact your administrator to configure custom retention.

Quality Score

Each table in Calabi Catalogue is automatically assigned a Quality Score (0–100%) based on the ratio of passing tests to total tests in the most recent run.

Score	Badge
90–100%	Green
70–89%	Yellow
< 70%	Red

Quality Scores appear on asset pages, in search results, and can be used as a filter in the Explore page.

Best Practices

Practice	Why
Test every column that appears in a certified metric	Ensures metric inputs are clean
Add a row count test to every critical table	Catch missing data loads before dashboards show wrong numbers
Use custom SQL tests for business-rule assertions that schema tests can't express	Maximum coverage
Set failure alerts for Gold-tier tables	Immediate notification when production data degrades
Run tests after every ETL job, not just on a time schedule	Aligns test results to data freshness
Review and update test thresholds as data grows	Prevents false failures from legitimate volume growth

Test Architecture​

Test Types​

Creating a Test Suite​

Step 1 — Create the suite​

Step 2 — Add tables to the suite​

Step 3 — Add test cases​

Configuring Individual Tests​

Not Null test​

Unique test​

Range test​

Row count test​

Custom SQL Tests​

Custom SQL test — configuration​

Scheduling Tests​

Recommended schedules​

Running Tests Manually​

Viewing Test Results​

On the asset page​

Test result status codes​

Viewing failure details​

Alerting on Test Failures​

Alert channels​

dbt Test Integration​

Test Result History​

Quality Score​

Best Practices​