Sync Modes

Professional+

Calabi Connect offers four sync modes that control how data is read from the source and written to the destination. Choosing the right mode for each stream is one of the most important configuration decisions you will make — it affects data freshness, warehouse storage costs, query performance, and the complexity of downstream transformation logic.

Mode Overview

Mode	Source Read	Destination Write	Primary Key Required	Cursor Required
Full Refresh — Overwrite	All rows	Truncate + rewrite	No	No
Full Refresh — Append	All rows	Append without dedup	No	No
Incremental — Append	New/changed rows only	Append	No	Yes
Incremental — Append + Deduped	New/changed rows only	Upsert (merge by PK)	Yes	Yes

Full Refresh — Overwrite

How It Works

On every sync, Calabi Connect reads all rows from the source stream and replaces the entire destination table. The destination table is dropped (or truncated) and repopulated from scratch.

When to Use

Reference tables — currency codes, country lists, product catalogues with fewer than ~50K rows
Slowly changing, small tables — where complete freshness is needed and storage cost is negligible
Tables without a reliable cursor — some APIs cannot report what changed since the last sync

When NOT to Use

Tables that grow continuously (event logs, transaction history) — the sync time and cost scale linearly with table size
Tables where you need historical versions of records (overwrite destroys history)

Example Configuration

A currency_codes stream from a finance system:

Stream: currency_codes
Sync Mode: Full Refresh — Overwrite
Schedule: @daily

Downstream query is always a simple SELECT * FROM currency_codes — always current, no deduplication needed.

Full Refresh — Append

How It Works

On every sync, Calabi Connect reads all rows from the source and appends them to the destination table. No existing rows are deleted or modified. Each sync adds a complete snapshot of the source to the bottom of the destination table.

When to Use

Audit tables — you need a timestamped record of every state the source data was ever in
Slowly changing dimension tracking — capture the full history of how a record changed over time
Event streaming that Calabi Connect sources expose as full-state — when incremental isn't available

Querying Full Refresh Append Tables

Because every sync produces a full copy, your downstream queries need to deduplicate by selecting the latest snapshot:

-- Get the current state of each customer from an append table
select *
from (
    select *,
           row_number() over (
               partition by customer_id
               order by _airbyte_extracted_at desc
           ) as rn
    from raw.salesforce.customer
) t
where rn = 1

Calabi Transform's _airbyte_extracted_at metadata column (automatically added to all synced tables) timestamps each row's sync batch.

Incremental — Append

How It Works

Calabi Connect uses a cursor field — a column in the source table that monotonically increases as records are added or updated (typically updated_at, created_at, or modified_date). On each sync:

Calabi Connect reads the current maximum cursor value from the state saved from the last sync.
It fetches only rows from the source where cursor_field > last_cursor_value.
New rows are appended to the destination. Existing rows are not modified.

Cursor Field Setup

Select the cursor field when configuring the stream in the connection settings. The cursor must:

Be a datetime or monotonically increasing integer column
Be updated whenever a row is created or modified
Not be nullable (or nullable rows are excluded)

Good cursor fields:

updated_at
modified_at
_etl_timestamp
created_at (for append-only sources where rows are never updated)

Poor cursor fields:

name (not monotonic)
status (not monotonic)
Any nullable column

When to Use

Event streams — clicks, page views, server logs where events are immutable once created
Append-only audit logs — rows are never updated, only inserted
Sources that don't support deduplication — when no reliable primary key exists

Limitation: Duplicates on Updates

If a row is updated in the source (e.g., a customer changes their email), incremental append will add a new row with the updated data. The old row remains. Downstream models must deduplicate.

Use Incremental — Append + Deduped if you need the destination to reflect the latest version of each record without duplicates.

Incremental — Append + Deduped

How It Works

This mode combines the efficiency of incremental syncing with deduplication logic:

New/changed rows are synced incrementally (same as Incremental — Append).
After syncing, Calabi Connect performs a deduplication step using the configured primary key, merging the raw append data into a final deduplicated table.

The destination contains two tables:

_airbyte_raw__stream_<name> — the raw append history (all batches)
<name> — the deduplicated final table (one row per primary key)

Primary Key Requirements

The primary key must:

Uniquely identify each record in the source
Be stable — the value must not change over the life of a record
Be present on every row (not nullable)

When to Use

This is the recommended mode for most operational tables. Use it for:

CRM objects (contacts, accounts, deals, opportunities)
Transactional tables (orders, invoices, subscriptions)
Any table where records are updated after creation and you need the current state
Any table with a natural primary key

Example

A Salesforce Account object synced to Snowflake:

Stream: Account
Sync Mode: Incremental — Append + Deduped
Cursor Field: SystemModstamp
Primary Key: Id

The destination table salesforce.account always has exactly one row per Salesforce account, showing the latest field values. Updated accounts are reflected on the next sync without accumulating duplicate rows.

Mode Selection Guide

Use this decision tree when configuring a stream:

Sync Mode Comparison by Use Case

Use Case	Recommended Mode	Rationale
Country/currency lookup tables	Full Refresh — Overwrite	Small, stable, no history needed
Daily financial exchange rates (historical)	Full Refresh — Append	Audit trail of each day's rates
Salesforce contacts / accounts	Incremental — Append + Deduped	Large, frequently updated, has PK
Stripe payment events	Incremental — Append	Events are immutable; no PK needed
Website click events	Incremental — Append	Append-only, high volume
Jira issues (current state)	Incremental — Append + Deduped	Updated frequently; want current state
Product catalogue (<10K items)	Full Refresh — Overwrite	Small enough to rebuild every time
SaaS subscription records	Incremental — Append + Deduped	Subscriptions change; need current state

Resetting Sync State

If you change the sync mode or cursor field for an existing stream, you must reset the sync state to force a full re-sync from the source. Without resetting, Calabi Connect may use a stale cursor value that causes rows to be missed or duplicated.

To reset:

Navigate to Calabi Connect → Connections → [your connection].
Click Schema and find the stream you changed.
Click Reset Stream next to that stream (or Reset All Streams to reset the full connection).
Click Save & Sync to trigger the resync.

Reset drops destination data

Resetting a stream drops the destination table for that stream and rebuilds it from scratch. Any downstream tables or views that depend on it will be temporarily unavailable.

Metadata Columns

Calabi Connect automatically adds internal metadata columns to every synced table. These columns are useful for debugging and for building incremental Calabi Transform models.

Column	Type	Description
`_airbyte_raw_id`	VARCHAR	Unique identifier for this row's source record
`_airbyte_extracted_at`	TIMESTAMP	When Calabi Connect ingested this row
`_airbyte_meta`	JSON	Internal sync metadata (errors, source record metadata)

Use _airbyte_extracted_at as the cursor in incremental Calabi Transform models that build on top of Calabi Connect tables:

-- models/staging/stg_sf_accounts.sql
{{ config(materialized='incremental', unique_key='account_id') }}

select
    id          as account_id,
    name        as account_name,
    industry,
    annual_revenue,
    _airbyte_extracted_at as synced_at
from {{ source('salesforce', 'account') }}

{% if is_incremental() %}
where _airbyte_extracted_at > (select max(synced_at) from {{ this }})
{% endif %}

Next Steps

Creating Connections — Set sync modes when configuring a connection
Monitoring Sync Jobs — Track sync performance and catch failures
Available Connectors — See which connectors support incremental sync

Mode Overview​

Full Refresh — Overwrite​

How It Works​

When to Use​

When NOT to Use​

Example Configuration​

Full Refresh — Append​

How It Works​

When to Use​

Querying Full Refresh Append Tables​

Incremental — Append​

How It Works​

Cursor Field Setup​

When to Use​

Limitation: Duplicates on Updates​

Incremental — Append + Deduped​

How It Works​

Primary Key Requirements​

When to Use​

Example​

Mode Selection Guide​

Sync Mode Comparison by Use Case​

Resetting Sync State​

Metadata Columns​

Next Steps​

Mode Overview

Full Refresh — Overwrite

How It Works

When to Use

When NOT to Use

Example Configuration

Full Refresh — Append

How It Works

When to Use

Querying Full Refresh Append Tables

Incremental — Append

How It Works

Cursor Field Setup

When to Use

Limitation: Duplicates on Updates

Incremental — Append + Deduped

How It Works

Primary Key Requirements

When to Use

Example

Mode Selection Guide

Sync Mode Comparison by Use Case

Resetting Sync State

Metadata Columns

Next Steps