Sync Modes
Calabi Connect offers four sync modes that control how data is read from the source and written to the destination. Choosing the right mode for each stream is one of the most important configuration decisions you will make — it affects data freshness, warehouse storage costs, query performance, and the complexity of downstream transformation logic.
Mode Overview
| Mode | Source Read | Destination Write | Primary Key Required | Cursor Required |
|---|---|---|---|---|
| Full Refresh — Overwrite | All rows | Truncate + rewrite | No | No |
| Full Refresh — Append | All rows | Append without dedup | No | No |
| Incremental — Append | New/changed rows only | Append | No | Yes |
| Incremental — Append + Deduped | New/changed rows only | Upsert (merge by PK) | Yes | Yes |
Full Refresh — Overwrite
How It Works
On every sync, Calabi Connect reads all rows from the source stream and replaces the entire destination table. The destination table is dropped (or truncated) and repopulated from scratch.
When to Use
- Reference tables — currency codes, country lists, product catalogues with fewer than ~50K rows
- Slowly changing, small tables — where complete freshness is needed and storage cost is negligible
- Tables without a reliable cursor — some APIs cannot report what changed since the last sync
When NOT to Use
- Tables that grow continuously (event logs, transaction history) — the sync time and cost scale linearly with table size
- Tables where you need historical versions of records (overwrite destroys history)
Example Configuration
A currency_codes stream from a finance system:
Stream: currency_codes
Sync Mode: Full Refresh — Overwrite
Schedule: @daily
Downstream query is always a simple SELECT * FROM currency_codes — always current, no deduplication needed.
Full Refresh — Append
How It Works
On every sync, Calabi Connect reads all rows from the source and appends them to the destination table. No existing rows are deleted or modified. Each sync adds a complete snapshot of the source to the bottom of the destination table.
When to Use
- Audit tables — you need a timestamped record of every state the source data was ever in
- Slowly changing dimension tracking — capture the full history of how a record changed over time
- Event streaming that Calabi Connect sources expose as full-state — when incremental isn't available
Querying Full Refresh Append Tables
Because every sync produces a full copy, your downstream queries need to deduplicate by selecting the latest snapshot:
-- Get the current state of each customer from an append table
select *
from (
select *,
row_number() over (
partition by customer_id
order by _airbyte_extracted_at desc
) as rn
from raw.salesforce.customer
) t
where rn = 1
Calabi Transform's _airbyte_extracted_at metadata column (automatically added to all synced tables) timestamps each row's sync batch.
Incremental — Append
How It Works
Calabi Connect uses a cursor field — a column in the source table that monotonically increases as records are added or updated (typically updated_at, created_at, or modified_date). On each sync:
- Calabi Connect reads the current maximum cursor value from the state saved from the last sync.
- It fetches only rows from the source where
cursor_field > last_cursor_value. - New rows are appended to the destination. Existing rows are not modified.
Cursor Field Setup
Select the cursor field when configuring the stream in the connection settings. The cursor must:
- Be a datetime or monotonically increasing integer column
- Be updated whenever a row is created or modified
- Not be nullable (or nullable rows are excluded)
Good cursor fields:
updated_atmodified_at_etl_timestampcreated_at(for append-only sources where rows are never updated)
Poor cursor fields:
name(not monotonic)status(not monotonic)- Any nullable column
When to Use
- Event streams — clicks, page views, server logs where events are immutable once created
- Append-only audit logs — rows are never updated, only inserted
- Sources that don't support deduplication — when no reliable primary key exists
Limitation: Duplicates on Updates
If a row is updated in the source (e.g., a customer changes their email), incremental append will add a new row with the updated data. The old row remains. Downstream models must deduplicate.
Use Incremental — Append + Deduped if you need the destination to reflect the latest version of each record without duplicates.
Incremental — Append + Deduped
How It Works
This mode combines the efficiency of incremental syncing with deduplication logic:
- New/changed rows are synced incrementally (same as Incremental — Append).
- After syncing, Calabi Connect performs a deduplication step using the configured primary key, merging the raw append data into a final deduplicated table.
The destination contains two tables:
_airbyte_raw__stream_<name>— the raw append history (all batches)<name>— the deduplicated final table (one row per primary key)
Primary Key Requirements
The primary key must:
- Uniquely identify each record in the source
- Be stable — the value must not change over the life of a record
- Be present on every row (not nullable)
When to Use
This is the recommended mode for most operational tables. Use it for:
- CRM objects (contacts, accounts, deals, opportunities)
- Transactional tables (orders, invoices, subscriptions)
- Any table where records are updated after creation and you need the current state
- Any table with a natural primary key
Example
A Salesforce Account object synced to Snowflake:
Stream: Account
Sync Mode: Incremental — Append + Deduped
Cursor Field: SystemModstamp
Primary Key: Id
The destination table salesforce.account always has exactly one row per Salesforce account, showing the latest field values. Updated accounts are reflected on the next sync without accumulating duplicate rows.
Mode Selection Guide
Use this decision tree when configuring a stream:
Sync Mode Comparison by Use Case
| Use Case | Recommended Mode | Rationale |
|---|---|---|
| Country/currency lookup tables | Full Refresh — Overwrite | Small, stable, no history needed |
| Daily financial exchange rates (historical) | Full Refresh — Append | Audit trail of each day's rates |
| Salesforce contacts / accounts | Incremental — Append + Deduped | Large, frequently updated, has PK |
| Stripe payment events | Incremental — Append | Events are immutable; no PK needed |
| Website click events | Incremental — Append | Append-only, high volume |
| Jira issues (current state) | Incremental — Append + Deduped | Updated frequently; want current state |
| Product catalogue (<10K items) | Full Refresh — Overwrite | Small enough to rebuild every time |
| SaaS subscription records | Incremental — Append + Deduped | Subscriptions change; need current state |
Resetting Sync State
If you change the sync mode or cursor field for an existing stream, you must reset the sync state to force a full re-sync from the source. Without resetting, Calabi Connect may use a stale cursor value that causes rows to be missed or duplicated.
To reset:
- Navigate to Calabi Connect → Connections → [your connection].
- Click Schema and find the stream you changed.
- Click Reset Stream next to that stream (or Reset All Streams to reset the full connection).
- Click Save & Sync to trigger the resync.
Resetting a stream drops the destination table for that stream and rebuilds it from scratch. Any downstream tables or views that depend on it will be temporarily unavailable.
Metadata Columns
Calabi Connect automatically adds internal metadata columns to every synced table. These columns are useful for debugging and for building incremental Calabi Transform models.
| Column | Type | Description |
|---|---|---|
_airbyte_raw_id | VARCHAR | Unique identifier for this row's source record |
_airbyte_extracted_at | TIMESTAMP | When Calabi Connect ingested this row |
_airbyte_meta | JSON | Internal sync metadata (errors, source record metadata) |
Use _airbyte_extracted_at as the cursor in incremental Calabi Transform models that build on top of Calabi Connect tables:
-- models/staging/stg_sf_accounts.sql
{{ config(materialized='incremental', unique_key='account_id') }}
select
id as account_id,
name as account_name,
industry,
annual_revenue,
_airbyte_extracted_at as synced_at
from {{ source('salesforce', 'account') }}
{% if is_incremental() %}
where _airbyte_extracted_at > (select max(synced_at) from {{ this }})
{% endif %}
Next Steps
- Creating Connections — Set sync modes when configuring a connection
- Monitoring Sync Jobs — Track sync performance and catch failures
- Available Connectors — See which connectors support incremental sync