Skip to main content

Sync Modes

Professional+

Calabi Connect offers four sync modes that control how data is read from the source and written to the destination. Choosing the right mode for each stream is one of the most important configuration decisions you will make — it affects data freshness, warehouse storage costs, query performance, and the complexity of downstream transformation logic.


Mode Overview

ModeSource ReadDestination WritePrimary Key RequiredCursor Required
Full Refresh — OverwriteAll rowsTruncate + rewriteNoNo
Full Refresh — AppendAll rowsAppend without dedupNoNo
Incremental — AppendNew/changed rows onlyAppendNoYes
Incremental — Append + DedupedNew/changed rows onlyUpsert (merge by PK)YesYes

Full Refresh — Overwrite

How It Works

On every sync, Calabi Connect reads all rows from the source stream and replaces the entire destination table. The destination table is dropped (or truncated) and repopulated from scratch.

When to Use

  • Reference tables — currency codes, country lists, product catalogues with fewer than ~50K rows
  • Slowly changing, small tables — where complete freshness is needed and storage cost is negligible
  • Tables without a reliable cursor — some APIs cannot report what changed since the last sync

When NOT to Use

  • Tables that grow continuously (event logs, transaction history) — the sync time and cost scale linearly with table size
  • Tables where you need historical versions of records (overwrite destroys history)

Example Configuration

A currency_codes stream from a finance system:

Stream: currency_codes
Sync Mode: Full Refresh — Overwrite
Schedule: @daily

Downstream query is always a simple SELECT * FROM currency_codes — always current, no deduplication needed.


Full Refresh — Append

How It Works

On every sync, Calabi Connect reads all rows from the source and appends them to the destination table. No existing rows are deleted or modified. Each sync adds a complete snapshot of the source to the bottom of the destination table.

When to Use

  • Audit tables — you need a timestamped record of every state the source data was ever in
  • Slowly changing dimension tracking — capture the full history of how a record changed over time
  • Event streaming that Calabi Connect sources expose as full-state — when incremental isn't available

Querying Full Refresh Append Tables

Because every sync produces a full copy, your downstream queries need to deduplicate by selecting the latest snapshot:

-- Get the current state of each customer from an append table
select *
from (
select *,
row_number() over (
partition by customer_id
order by _airbyte_extracted_at desc
) as rn
from raw.salesforce.customer
) t
where rn = 1

Calabi Transform's _airbyte_extracted_at metadata column (automatically added to all synced tables) timestamps each row's sync batch.


Incremental — Append

How It Works

Calabi Connect uses a cursor field — a column in the source table that monotonically increases as records are added or updated (typically updated_at, created_at, or modified_date). On each sync:

  1. Calabi Connect reads the current maximum cursor value from the state saved from the last sync.
  2. It fetches only rows from the source where cursor_field > last_cursor_value.
  3. New rows are appended to the destination. Existing rows are not modified.

Cursor Field Setup

Select the cursor field when configuring the stream in the connection settings. The cursor must:

  • Be a datetime or monotonically increasing integer column
  • Be updated whenever a row is created or modified
  • Not be nullable (or nullable rows are excluded)

Good cursor fields:

  • updated_at
  • modified_at
  • _etl_timestamp
  • created_at (for append-only sources where rows are never updated)

Poor cursor fields:

  • name (not monotonic)
  • status (not monotonic)
  • Any nullable column

When to Use

  • Event streams — clicks, page views, server logs where events are immutable once created
  • Append-only audit logs — rows are never updated, only inserted
  • Sources that don't support deduplication — when no reliable primary key exists

Limitation: Duplicates on Updates

If a row is updated in the source (e.g., a customer changes their email), incremental append will add a new row with the updated data. The old row remains. Downstream models must deduplicate.

Use Incremental — Append + Deduped if you need the destination to reflect the latest version of each record without duplicates.


Incremental — Append + Deduped

How It Works

This mode combines the efficiency of incremental syncing with deduplication logic:

  1. New/changed rows are synced incrementally (same as Incremental — Append).
  2. After syncing, Calabi Connect performs a deduplication step using the configured primary key, merging the raw append data into a final deduplicated table.

The destination contains two tables:

  • _airbyte_raw__stream_<name> — the raw append history (all batches)
  • <name> — the deduplicated final table (one row per primary key)

Primary Key Requirements

The primary key must:

  • Uniquely identify each record in the source
  • Be stable — the value must not change over the life of a record
  • Be present on every row (not nullable)

When to Use

This is the recommended mode for most operational tables. Use it for:

  • CRM objects (contacts, accounts, deals, opportunities)
  • Transactional tables (orders, invoices, subscriptions)
  • Any table where records are updated after creation and you need the current state
  • Any table with a natural primary key

Example

A Salesforce Account object synced to Snowflake:

Stream: Account
Sync Mode: Incremental — Append + Deduped
Cursor Field: SystemModstamp
Primary Key: Id

The destination table salesforce.account always has exactly one row per Salesforce account, showing the latest field values. Updated accounts are reflected on the next sync without accumulating duplicate rows.


Mode Selection Guide

Use this decision tree when configuring a stream:


Sync Mode Comparison by Use Case

Use CaseRecommended ModeRationale
Country/currency lookup tablesFull Refresh — OverwriteSmall, stable, no history needed
Daily financial exchange rates (historical)Full Refresh — AppendAudit trail of each day's rates
Salesforce contacts / accountsIncremental — Append + DedupedLarge, frequently updated, has PK
Stripe payment eventsIncremental — AppendEvents are immutable; no PK needed
Website click eventsIncremental — AppendAppend-only, high volume
Jira issues (current state)Incremental — Append + DedupedUpdated frequently; want current state
Product catalogue (<10K items)Full Refresh — OverwriteSmall enough to rebuild every time
SaaS subscription recordsIncremental — Append + DedupedSubscriptions change; need current state

Resetting Sync State

If you change the sync mode or cursor field for an existing stream, you must reset the sync state to force a full re-sync from the source. Without resetting, Calabi Connect may use a stale cursor value that causes rows to be missed or duplicated.

To reset:

  1. Navigate to Calabi Connect → Connections → [your connection].
  2. Click Schema and find the stream you changed.
  3. Click Reset Stream next to that stream (or Reset All Streams to reset the full connection).
  4. Click Save & Sync to trigger the resync.
Reset drops destination data

Resetting a stream drops the destination table for that stream and rebuilds it from scratch. Any downstream tables or views that depend on it will be temporarily unavailable.


Metadata Columns

Calabi Connect automatically adds internal metadata columns to every synced table. These columns are useful for debugging and for building incremental Calabi Transform models.

ColumnTypeDescription
_airbyte_raw_idVARCHARUnique identifier for this row's source record
_airbyte_extracted_atTIMESTAMPWhen Calabi Connect ingested this row
_airbyte_metaJSONInternal sync metadata (errors, source record metadata)

Use _airbyte_extracted_at as the cursor in incremental Calabi Transform models that build on top of Calabi Connect tables:

-- models/staging/stg_sf_accounts.sql
{{ config(materialized='incremental', unique_key='account_id') }}

select
id as account_id,
name as account_name,
industry,
annual_revenue,
_airbyte_extracted_at as synced_at
from {{ source('salesforce', 'account') }}

{% if is_incremental() %}
where _airbyte_extracted_at > (select max(synced_at) from {{ this }})
{% endif %}

Next Steps