Skip to main content

Calabi Pipelines — Pipeline Orchestration

Professional+

Calabi Pipelines is the orchestration engine that schedules and runs your data pipelines. In Calabi, Calabi Pipelines runs with the KubernetesExecutor — each task spawns an isolated Kubernetes pod, giving perfect task isolation and horizontal scalability.

Accessing Calabi Pipelines

Navigate to calabi.{domain}/airflow. Your Calabi SSO credentials work automatically.

Core Concepts

ConceptDescription
DAGDirected Acyclic Graph — a pipeline defined as Python code
TaskOne unit of work inside a DAG (run a dbt model, call an API, run a SQL query)
DAG RunOne execution of a DAG at a specific point in time
Task InstanceOne execution of a task within a specific DAG run
ScheduleA cron expression or preset (e.g., @daily, 0 6 * * *)

Production DAGs

DAGSchedulePurpose
prod_data_unification_dagDaily 06:00 UTCRuns all dbt models in data_unification project
dynamo_to_warehouse_incrementalHourlyDynamoDB → data warehouse incremental sync
calabi_data_ingestionDailyRaw data preparation
psychometric_reporting_dagWeekly (Sunday)Psychometric scoring models
recon_asgard_production_validationDailyData reconciliation validation

KubernetesExecutor

Calabi runs Calabi Pipelines with KubernetesExecutor:

Benefits:

  • Perfect isolation — one pod per task
  • No shared workers — tasks can't interfere with each other
  • Automatic resource cleanup after task completion
  • Scale to zero when no tasks are running