Data Assets
A data asset is any artifact in your data ecosystem that has business value, can be owned, described, and governed. Calabi Catalogue tracks every type of data asset produced and consumed across your organization — from raw database tables to trained machine learning models — giving every asset a discoverable, documented home.
What Counts as a Data Asset?
Calabi recognizes the following asset types. Each type is indexed in Calabi Catalogue, supports ownership, tagging, descriptions, and lineage tracking.
| Asset Type | Description | Example |
|---|---|---|
| Table | A database table or view in any connected data warehouse or database. | prod.sales.fact_orders |
| Dashboard | A CalabiIQ dashboard or any externally registered BI dashboard. | "Q1 2026 Executive Summary" |
| Chart | An individual CalabiIQ chart or visualization. | "Daily Active Users — Line Chart" |
| Pipeline | A Calabi Pipelines DAG or any registered data pipeline. | daily_dbt_run |
| ML Model | A trained model registered in Calabi ML, with version and stage. | CustomerChurnModel v4 (Production) |
| Topic | A streaming data topic (Kafka, Kinesis, EventBridge). | orders.events.v2 |
| Container | An S3 bucket, GCS bucket, or Azure Blob container used as a data store. | acme-corp-raw-data |
| Report | A scheduled report, including PDF exports and email deliveries. | "Monthly Finance Report" |
| Metric | A defined business metric with a canonical SQL definition. | "Monthly Recurring Revenue (MRR)" |
| Glossary Term | A defined business term in the data glossary. | "Monthly Active User" |
| Data Product | A curated, governed collection of related assets for a specific domain. | "Customer 360 Data Product" |
Asset Ownership
Every data asset in Calabi Catalogue has an Owner — a user or team responsible for its quality, documentation, and fitness for use.
Owner Responsibilities
The assigned owner is expected to:
- Keep the asset's description accurate and up to date.
- Respond to quality alerts when data quality tests fail.
- Review and approve glossary term links to the asset.
- Participate in data contract discussions when downstream consumers are affected by schema changes.
- Mark the asset as deprecated when it should no longer be used.
Assigning Ownership
Ownership can be assigned three ways:
-
Manually in Calabi Catalogue:
- Open the asset → Edit → Owner field → Search for a user or team.
-
Via the AI Agent:
"Set the owner of the fact_orders table to the Data Engineering team." -
Programmatically via the Calabi Catalogue API:
import requests
requests.patch(
"https://<calabi-domain>/api/v1/tables/<table-id>",
json={"owner": {"type": "team", "id": "data-engineering"}},
headers={"Authorization": "Bearer <token>"}
)
Team Ownership
Assets can be owned by a team rather than an individual. Team ownership is recommended for assets used by multiple people, as it prevents a single point of failure when individuals leave or change roles.
Teams are managed in Admin → User Management → Teams.
Asset Lifecycle
Every data asset in Calabi Catalogue exists in one of four lifecycle states.
| State | Description | Visibility |
|---|---|---|
| Active | Asset is in production use and suitable for consumption. | Fully visible in search and lineage |
| Deprecated | Asset is still functional but should not be used for new work. A deprecation notice is shown to users. | Visible with warning banner; excluded from default search results |
| Deleted | Asset has been removed from its source system. Calabi retains the metadata record for lineage and audit purposes. | Hidden from default search; visible with "Include deleted" filter |
Setting Deprecation
- Open the asset in Calabi Catalogue.
- Click Edit → toggle Status to Deprecated.
- Add a Deprecation Note explaining the reason and what asset consumers should use instead.
- Save — existing consumers of the asset receive a notification (if configured in Calabi Automate).
Asset Discovery
Calabi Catalogue builds its asset inventory through two mechanisms:
Automated Ingestion via Calabi Connect
When you configure a source in Calabi Connect (e.g., Redshift, Snowflake, dbt), Calabi Catalogue automatically discovers and indexes all tables, views, columns, pipelines, and dashboards from that source. Discovery runs on the Calabi Connect sync schedule (typically daily).
What gets extracted automatically:
- Table and column names
- Data types
- Row counts (approximate)
- Sample values (for column profiling)
- Native database descriptions and comments
- dbt model descriptions and column-level documentation
- Column lineage (from dbt or query parsing)
Manual Registration
Assets that cannot be auto-discovered (e.g., an external report, a process outside Calabi) can be registered manually:
- In Calabi Catalogue, click + New Asset.
- Select the asset type (Dashboard, Report, ML Model, etc.).
- Fill in the required fields: name, description, owner, tags.
- Optionally add lineage links to other assets.
- Click Publish.
Governance: Discovery vs Governance
Calabi Catalogue supports two complementary governance postures:
| Posture | Goal | Tools Used |
|---|---|---|
| Discovery | Help users find the right asset for their needs. Make assets searchable, described, and understood. | Full-text search, tags, glossary, descriptions, ownership, popularity metrics |
| Governance | Ensure assets meet quality, compliance, and security standards before consumption. | Data quality tests, PII classification, deprecation workflow, data contracts, lineage impact analysis |
An asset moves through both postures simultaneously. A table can be discoverable (found in search, described, owned) while also being under active governance (quality tests running, PII flagged, lineage tracked).
Discovery Features
- Full-text search across names, descriptions, column names, and tags.
- Faceted filtering by asset type, owner, domain, tag, database, schema.
- Usage metrics — see how many queries reference this asset per week.
- Popularity ranking — frequently-queried assets rank higher in search results.
- Glossary links — assets linked to glossary terms surface when users search for business terms, not just technical names.
Governance Features
- Data quality tests — define column-level, row-level, and referential integrity checks; see pass/fail history.
- PII classification — tag columns containing personally identifiable information for compliance tracking.
- Lineage — upstream and downstream dependency graph. Understand the impact before changing an asset.
- Data contracts — define the schema contract between a producer and its consumers. Get notified when the contract is at risk of breaking.
- Audit trail — every change to asset metadata is recorded with actor, timestamp, and before/after values.
Tags and Classifications
Tags are free-form labels that can be applied to any asset or column to enable filtering, grouping, and policy enforcement.
Built-In Classification Tags
| Tag | Meaning |
|---|---|
PII | Contains personally identifiable information |
Sensitive | Sensitive but not strictly PII (e.g., salary data) |
Certified | Reviewed and certified by a Data Steward as reliable |
Deprecated | Should not be used for new work |
Experimental | Work in progress; not yet production-quality |
Golden Record | The single authoritative source for this entity |
Custom Tags
Admins can create organization-specific tags in Calabi Catalogue → Tags → + New Tag. Tags can be organized into hierarchies (e.g., Compliance > GDPR, Compliance > HIPAA).
Asset Lineage
Every data asset in Calabi Catalogue participates in the platform-wide lineage graph — a directed acyclic graph (DAG) that shows how data flows from source to destination.
Lineage enables:
- Impact analysis — before changing a table schema, see all downstream dashboards and models affected.
- Root cause analysis — when a dashboard shows wrong numbers, trace back to the source causing the issue.
- Compliance reporting — demonstrate data lineage for regulatory audits (GDPR, SOC 2, etc.).
Related Pages
- Calabi Catalogue Overview — Full Catalogue feature documentation
- Roles & Permissions — Who can manage and govern assets
- AI Agent: Asking Questions — Query and explore assets via natural language