Skip to main content

Clickhouse

The ClickHouse destination connector syncs data from Calabi Connect sources to ClickHouse, a high-performance columnar database designed for online analytical processing (OLAP). This connector writes data directly to ClickHouse tables with proper typing, enabling fast analytical queries on your replicated data.

This is a complete rewrite of the ClickHouse destination connector built on Calabi Connect's Bulk CDK framework, replacing the legacy v1 connector.

How version 2 improves on version 1

Version 2.0.0 represents a complete architectural redesign of the ClickHouse destination connector with significant improvements:

  • All sync modes supported: Full Refresh (Overwrite and Append) and Incremental (Append and Append + Deduped) sync modes are now fully supported.
  • [Direct Load] with typed columns: Calabi Connect writes data directly to typed columns matching your source schema, rather than storing everything as JSON in raw tables. This improves query performance and reduces storage requirements.
  • Improved performance: The new architecture uses ClickHouse's native binary protocol and batch inserts for faster data loading.
  • Active maintenance: Built on Calabi Connect's modern CDK framework with ongoing development and support from the Calabi Connect team.

Supported sync modes

Sync modeSupported?
Full Refresh - OverwriteYes
Full Refresh - AppendYes
Full Refresh - Overwrite + DedupedYes
Incremental Sync - AppendYes
Incremental Sync - Append + DedupedYes

Deduplication leverages ClickHouse's ReplacingMergeTree table engine. See Deduplication below for details.

Deduplication

For optimal deduplication in Incremental - Append + Deduped sync mode, use a cursor column with one of these types:

  • Integer types (Int64, etc.)
  • Date
  • Timestamp (DateTime64)

If you use a different cursor column type, like string, the connector falls back to using the _airbyte_extracted_at timestamp for deduplication ordering. This fallback may not accurately reflect the natural ordering of your source data, and you'll see a warning in the sync logs.

warning

Calabi Connect's ClickHouse connector leverages the ReplacingMergeTree table engine to handle deduplication. To guarantee deduplicated results at query time, you can add the FINAL operator to your query string. For example:

SELECT * FROM your_table FINAL

Without this, you may see duplicated or deleted results when querying your data.

Alternatively, you may also be able to tune your merge settings to better match your query patterns.

Requirements

To use the ClickHouse destination connector, you need:

  • A ClickHouse instance (ClickHouse Cloud or self-hosted)
  • ClickHouse server version 21.8.10.19 or later
  • Network access from Calabi Connect to your ClickHouse instance
  • A ClickHouse user with appropriate permissions (see below)

Setup guide

1. Configure network access

Ensure your ClickHouse database is accessible from Calabi Connect.

Calabi Connect deploymentClickhouse deploymentDo this
CloudCloudWhitelist Calabi Connect's [IP addresses] in your ClickHouse Cloud settings.
CloudSelf-managedConfigure your firewall to allow inbound connections on port 8443 (HTTPS) or 8123 (HTTP) from Calabi Connect's [IP addresses].
Self-managedCloudWhitelist your Calabi Connect server's public IP address in ClickHouse Cloud settings.
Self-managedSelf-managedEnsure port 8443 (HTTPS) or 8123 (HTTP) is accessible from your Calabi Connect host. If both are in the same private network, configure security groups or firewall rules to allow traffic between them.

If you can't expose ClickHouse publicly, use SSH Tunneling via a bastion host that can reach ClickHouse.

2. Create a dedicated user with permissions

tip

It's best to create a dedicated ClickHouse user for Calabi Connect rather than using an existing user. This improves security and makes it easier to audit Calabi Connect's database operations.

Create a ClickHouse user for Calabi Connect with the following permissions:

  • Create and manage databases
  • Create, alter, drop, and truncate tables
  • Insert and select data

To create a user with the required permissions, run the following SQL commands in your ClickHouse instance:

-- Create the user (replace 'your_password' with a secure password)
CREATE USER airbyte_user IDENTIFIED BY 'your_password';

-- If async_insert is enabled in ClickHouse, disable it for the Airbyte user to ensure connection checks and data syncs work correctly. This fixes the "Error: Failed to insert expected rows into check table. Actual written: 0" error.
ALTER USER airbyte_user SETTINGS async_insert = 0;

-- Grant permissions on the default database
GRANT CREATE ON * TO airbyte_user;
GRANT CREATE ON {database}.* TO airbyte_user;
GRANT ALTER ON {database}.* TO airbyte_user;
GRANT TRUNCATE ON {database}.* TO airbyte_user;
GRANT INSERT ON {database}.* TO airbyte_user;
GRANT SELECT ON {database}.* TO airbyte_user;
GRANT CREATE DATABASE ON {database}.* TO airbyte_user;
GRANT CREATE TABLE ON {database}.* TO airbyte_user;
GRANT DROP TABLE ON {database}.* TO airbyte_user;

Replace {database} with the database name you configure in the connector settings. It's typically default.

If you configure custom namespaces in your Calabi Connect connections, grant permissions for each namespace:

GRANT CREATE ON {namespace}.* TO airbyte_user;
GRANT ALTER ON {namespace}.* TO airbyte_user;
GRANT TRUNCATE ON {namespace}.* TO airbyte_user;
GRANT INSERT ON {namespace}.* TO airbyte_user;
GRANT SELECT ON {namespace}.* TO airbyte_user;
GRANT CREATE DATABASE ON {namespace}.* TO airbyte_user;
GRANT CREATE TABLE ON {namespace}.* TO airbyte_user;
GRANT DROP TABLE ON {namespace}.* TO airbyte_user;

Replace {namespace} with each custom namespace you plan to use.

3. Configure the connector

  1. In Calabi Connect, click Destinations > ClickHouse.

  2. Configure the destination with the following information.

    • Hostname: Your ClickHouse server hostname (without protocol prefix like http:// or https://)
    • Port: HTTP port for ClickHouse (defaults are 8123 for HTTP and 8443 for HTTPS)
    • Protocol (self-hosted only): Choose HTTP or HTTPS. In Calabi Connect, this option is hidden and managed by the platform.
    • Database: Target database name (default: default)
    • Username: The ClickHouse user you created (for example, airbyte_user)
    • Password: The password for the ClickHouse user
    • Enable JSON: Whether to use ClickHouse's JSON type for object fields (recommended if your ClickHouse version supports it)
    • Record Window Size (advanced): The maximum number of records to write in a single batch. Tuning this parameter can impact performance. The batch size is also limited to 70 MB regardless of this setting. Most users don't need to change this value.

4. SSH tunnel (optional)

warning

SSH tunneling support is currently in Beta.

If your ClickHouse instance isn't directly accessible from Calabi Connect, you can use SSH tunneling to establish a secure connection. Configure the SSH tunnel settings in the connector configuration with your SSH host, port, username, and authentication method (password or private key).

Output schema

Calabi Connect writes each stream to its own table in ClickHouse. It creates tables in either the configured default database, typically default, or in a database corresponding to the namespace you specify for the stream when you set up your connection.

The connector converts Calabi Connect data types to ClickHouse types as follows:

  • Decimal types → Decimal(38, 9) (38 digit precision with 9 decimal places)
  • Timestamp types → DateTime64(3) (millisecond precision)
  • Object types → JSON if you enable JSON in the connector configuration, otherwise → String
  • Integer types → Int64
  • Boolean types → Bool
  • String types → String
  • Union types → String
  • Array types → String
note

The connector converts arrays and unions to strings for compatibility. If you need to query these as structured data, use ClickHouse's JSON functions to parse the string values.

Namespace support

This destination supports namespaces. The namespace maps to a ClickHouse database.