Data Quality Tests
Data Quality Tests in Calabi Catalogue give you automated, continuously-run assertions over your data. Tests run on a schedule, results are stored historically, and failures trigger alerts — so data problems are caught before they reach dashboards and reports.
Test Architecture
Test Types
| Test type | What it checks | Example |
|---|---|---|
| Not Null | Column has no NULL values | candidate_id IS NOT NULL |
| Unique | All values in a column are distinct | attempt_id is unique |
| Not In Set | Values do not appear in a forbidden list | status NOT IN ('voided', 'test') |
| In Set | Values are restricted to an allowed list | region IN ('EMEA', 'AMER', 'APAC') |
| Minimum | All values meet a lower bound | score >= 0 |
| Maximum | All values meet an upper bound | score <= 100 |
| Between (Range) | Values fall within a range | pass_threshold BETWEEN 50 AND 90 |
| Min Row Count | Table has at least N rows | At least 1,000 rows in scores |
| Max Row Count | Table has at most N rows | At most 100M rows in raw_events |
| Row Count Between | Row count falls in a range | scores between 500K and 50M rows |
| Column Values to Be of Type | Column data type matches expectation | exam_date is a timestamp |
| Custom SQL | Arbitrary SQL assertion | See examples below |
Creating a Test Suite
A Test Suite is a named collection of tests for one or more tables. Tests must belong to a suite before they can be scheduled.
Step 1 — Create the suite
- Navigate to Data Quality → Test Suites → + New Test Suite.
- Enter a Name (e.g.,
Exam Scores — Core Quality Checks). - Add a Description and Owners.
- Click Save.
Step 2 — Add tables to the suite
- Open the suite.
- Click + Add table.
- Search for and select the tables you want to test.
Step 3 — Add test cases
- Inside the suite, select a table.
- Click + Add test case.
- Configure the test (see below).
- Click Save.
Configuring Individual Tests
Not Null test
| Field | Value |
|---|---|
| Test name | candidate_id_not_null |
| Column | candidate_id |
| Test type | Column Values To Not Be Null |
| Fail threshold | 0 (zero nulls allowed) |
Unique test
| Field | Value |
|---|---|
| Test name | attempt_id_unique |
| Column | attempt_id |
| Test type | Column Values To Be Unique |
Range test
| Field | Value |
|---|---|
| Test name | score_range_0_to_100 |
| Column | score |
| Test type | Column Values To Be Between |
| Min value | 0 |
| Max value | 100 |
Row count test
| Field | Value |
|---|---|
| Test name | scores_min_row_count |
| Test type | Table Row Count To Be Between |
| Min value | 500000 |
| Max value | — (leave blank for no upper limit) |
Custom SQL Tests
Custom SQL tests let you write any assertion that evaluates to a single numeric value. The test passes when the result equals zero (no failures found).
-- Test: no future exam dates
-- Pass condition: COUNT = 0 (no rows with exam_date in the future)
SELECT COUNT(*)
FROM psychometric_reporting.scores
WHERE exam_date > CURRENT_TIMESTAMP;
-- Test: pass rate is not implausibly high (> 99%)
-- Pass condition: result = 0 (rate is within plausible range)
SELECT CASE
WHEN SUM(CASE WHEN passed THEN 1 ELSE 0 END) * 1.0 / COUNT(*) > 0.99
THEN 1
ELSE 0
END
FROM psychometric_reporting.scores
WHERE exam_date >= CURRENT_DATE - INTERVAL '1 day';
-- Test: no orphaned candidate_id references
SELECT COUNT(*)
FROM psychometric_reporting.scores s
WHERE NOT EXISTS (
SELECT 1 FROM dim.candidates c WHERE c.candidate_id = s.candidate_id
);
-- Test: no duplicate (candidate_id, exam_id, exam_date) combinations
SELECT COUNT(*) - COUNT(DISTINCT candidate_id || '|' || exam_id || '|' || exam_date::text)
FROM psychometric_reporting.scores;
Custom SQL test — configuration
| Field | Value |
|---|---|
| Test name | no_future_exam_dates |
| Test type | Custom SQL |
| SQL expression | Paste your SQL (must return a single number) |
| Pass condition | EqualTo |
| Pass value | 0 |
Scheduling Tests
- Open the test suite.
- Click Schedule Pipeline → + Add ingestion.
- Configure the schedule:
| Field | Description | Example |
|---|---|---|
| Name | Pipeline name | scores-quality-daily |
| Schedule | Cron expression | 0 6 * * * (daily at 6 AM) |
| Timezone | Cron evaluation timezone | UTC |
| Retry count | Retries on failure | 2 |
- Click Deploy.
The test pipeline appears in Settings → Ingestion Pipelines and can be monitored, paused, or run manually from there.
Recommended schedules
| Data freshness | Recommended test schedule |
|---|---|
| Real-time / near-real-time | Every 15–60 minutes |
| Daily ETL | 30–60 minutes after the ETL completes |
| Weekly batch | Weekly, 2 hours after batch window |
| Archived / reference data | Monthly |
Running Tests Manually
- Navigate to the asset page → Quality tab.
- Click Run Tests → Run all tests (or select specific tests).
- Results appear within seconds to minutes depending on table size.
Or from the test suite:
- Data Quality → Test Suites → open the suite.
- Click Run Now (top-right).
Viewing Test Results
On the asset page
- Open any table's asset page.
- Click the Quality tab.
The Quality tab shows:
- A summary badge: Pass / Fail / Warning count.
- A trend graph of historical pass rates.
- Per-test results with status, last run time, and failure details.
Test result status codes
| Status | Meaning |
|---|---|
| Success | The assertion passed with zero failures |
| Failed | The assertion returned a non-zero count |
| Aborted | The test run was cancelled or timed out |
| Queued | The test is scheduled but not yet started |
| Running | Currently executing |
Viewing failure details
Click on a Failed test to see:
- The number of rows that violated the assertion.
- A sample of up to 10 failing values (for column-level tests).
- The SQL generated for the test (expandable).
- The full error message if the test threw an exception.
Alerting on Test Failures
Configure alerts so the right team is notified immediately when quality degrades.
- Navigate to Settings → Alerts & Reports (Admin) or Governance → Quality Alerts.
- Click + New Alert.
- Configure:
| Field | Description | Example |
|---|---|---|
| Name | Alert name | Scores Table — Critical Failure Alert |
| Entity | Table or test suite to watch | psychometric_reporting.scores |
| Trigger | Condition to alert on | Any test failure |
| Recipients | Email addresses | data-engineering@yourcompany.com |
| Slack channel | Optional Slack destination | #data-alerts |
| Webhook URL | Optional HTTP endpoint | PagerDuty / OpsGenie integration URL |
- Click Save.
Alert channels
| Channel | Setup |
|---|---|
| Configure SMTP under Settings → Email; add recipient email addresses | |
| Slack | Configure Slack integration; specify channel name with # |
| Webhook | POST a JSON payload to any HTTP endpoint (PagerDuty, OpsGenie, Teams) |
dbt Test Integration
If your Calabi Transform (dbt) project includes tests, Calabi Catalogue automatically ingests dbt test results as part of the dbt metadata ingestion pipeline.
dbt test results appear on the same Quality tab as native Catalogue tests, clearly labelled with their source:
| Test | Source |
|---|---|
not_null:candidate_id | dbt |
unique:attempt_id | dbt |
accepted_values:region | dbt |
no_future_exam_dates | Catalogue Custom SQL |
This gives you a single quality dashboard per table regardless of whether tests were defined in dbt or Calabi Catalogue.
Test Result History
The Catalogue retains full result history for every test. Historical results power:
- Trend charts on the Quality tab — see pass rates rise or fall over time.
- SLO tracking — compute uptime and reliability metrics for critical tables.
- Incident investigation — identify exactly when a failure first appeared.
History is retained for 90 days on Professional tier. Contact your administrator to configure custom retention.
Quality Score
Each table in Calabi Catalogue is automatically assigned a Quality Score (0–100%) based on the ratio of passing tests to total tests in the most recent run.
| Score | Badge |
|---|---|
| 90–100% | Green |
| 70–89% | Yellow |
| < 70% | Red |
Quality Scores appear on asset pages, in search results, and can be used as a filter in the Explore page.
Best Practices
| Practice | Why |
|---|---|
| Test every column that appears in a certified metric | Ensures metric inputs are clean |
| Add a row count test to every critical table | Catch missing data loads before dashboards show wrong numbers |
| Use custom SQL tests for business-rule assertions that schema tests can't express | Maximum coverage |
| Set failure alerts for Gold-tier tables | Immediate notification when production data degrades |
| Run tests after every ETL job, not just on a time schedule | Aligns test results to data freshness |
| Review and update test thresholds as data grows | Prevents false failures from legitimate volume growth |