Integration Development Best Practices
This section is for developers building integrations on top of ClickHouse — BI tool connectors, ETL/ELT pipelines, data catalog adapters, IDE plugins, or any software that connects to ClickHouse on behalf of end users.
If you are a developer writing an application that happens to use ClickHouse directly, the language client documentation is a better starting point.
Start here based on what you are building:
| I am building... | Start with |
|---|---|
| A data pipeline or ETL connector that writes to ClickHouse | Ingestion patterns |
| A BI tool, query builder, or data catalog that reads from ClickHouse | Consumption patterns |
| An AI agent or LLM-powered tool | MCP integration guide |
| A Java/JDBC connector | JDBC connector guide |
| A Python integration | Python connector guide |
| A Go integration | Go connector guide |
| A JavaScript/TypeScript integration | JavaScript connector guide |
The rest of this page covers protocol selection, authentication, and a full gotcha reference and checklist that apply regardless of language.
Choosing a connectivity protocol
ClickHouse exposes several interfaces. Choose the one that best fits your integration's language and architecture:
| Protocol | Port | Best for |
|---|---|---|
| HTTP API | 8123 (HTTP), 8443 (HTTPS) | Custom connectors in any language, REST-oriented tools, maximum control over format and compression |
| JDBC | 8123 / 8443 | Java-based integrations, BI tools and frameworks that expect a standard JDBC DataSource |
| ODBC | 8123 / 8443 | Windows-native tools, Excel, legacy enterprise software |
| Native TCP | 9000 / 9440 | Internal tooling where maximum throughput is required; not recommended for external integrations as the protocol is not versioned for stability |
| MySQL wire protocol | 9004 | Tools with built-in MySQL support and no ClickHouse driver available |
| PostgreSQL wire protocol | 9005 | Tools with built-in PostgreSQL support and no ClickHouse driver available |
HTTP API is the recommended foundation for new integrations. It is stable, format-agnostic, works from any language, and exposes the full feature set of ClickHouse. JDBC is the best choice when your integration targets the Java ecosystem or needs to interoperate with JDBC-aware frameworks.
ClickHouse Cloud exposes only the HTTPS port (8443) and the secure native port (9440). Plaintext HTTP connections are not accepted. Design your integration to require TLS from the start.
Authentication
ClickHouse uses username and password authentication. Pass credentials either as HTTP Basic Auth headers or as URL parameters.
HTTP API
JDBC
Never embed credentials in connection URLs that may appear in logs or error messages. Read them from environment variables or a secrets manager.
Dedicated service accounts
Create a dedicated ClickHouse user for your integration with only the permissions it needs. Avoid connecting as default:
SSL/TLS
Always use TLS for connections to ClickHouse Cloud and strongly recommended for self-managed production clusters. For HTTPS connections, sslmode=strict (the default) verifies the server certificate. Use sslmode=none only in isolated development environments — never in production or user-facing integrations.
Identifying your integration
Always identify your integration in the HTTP User-Agent header and in per-query metadata. This makes queries attributable in system.query_log, which is invaluable for debugging customer issues and monitoring usage. Do this from day one — it is much harder to add retroactively once users are in production.
Set a User-Agent header
Use the format product/version (context):
Via JDBC:
Tag individual queries with log_comment
Attach operation context to each query so customer support and the customer themselves can filter system.query_log by feature or job:
Query your tagged queries in system.query_log
Schema discovery
ClickHouse exposes rich metadata through system.* tables. Use these to enumerate databases, tables, columns, and other objects for features like schema browsers, column pickers, and query editors.
Listing databases
Listing tables
Listing columns
INFORMATION_SCHEMA
ClickHouse also implements INFORMATION_SCHEMA for compatibility with tools that use standard SQL introspection queries. It covers a subset of metadata and is suitable when portability across databases matters:
Prefer system.columns over INFORMATION_SCHEMA.columns for ClickHouse-specific metadata like is_in_sorting_key and is_in_primary_key, which are important for generating efficient queries. Also note that INFORMATION_SCHEMA.TABLES.TABLE_ROWS is always NULL in ClickHouse — use system.tables.total_rows instead for row count estimates. INFORMATION_SCHEMA also has no foreign key data (KEY_COLUMN_USAGE and REFERENTIAL_CONSTRAINTS are always empty — ClickHouse has no foreign key constraints).
Data type mapping
ClickHouse has a richer type system than most databases your integration will encounter. The following sections cover the types that require special handling.
Numeric types
ClickHouse signed integers map predictably. Unsigned integers require care — they exceed the range of their same-width signed Java/SQL counterparts and must be promoted:
| ClickHouse Type | Recommended Java/SQL mapping |
|---|---|
| Int8, Int16, Int32, Int64 | Byte, Short, Integer, Long |
| Int128, Int256 | BigInteger |
| UInt8 | Short |
| UInt16 | Integer |
| UInt32 | Long |
| UInt64 | BigInteger |
| UInt128, UInt256 | BigInteger |
| Float32, Float64 | Float, Double |
| Decimal32/64/128/256 | BigDecimal |
String types
String in ClickHouse is a raw byte sequence with no enforced encoding (UTF-8 is conventional). FixedString(N) is zero-padded to N bytes when read — strip trailing null bytes (\0) before displaying values to users.
Date and time types
| ClickHouse Type | Notes |
|---|---|
Date | Days since 1970-01-01. No timezone. Map to LocalDate. |
Date32 | Extended date range. No timezone. Map to LocalDate. |
DateTime | Unix timestamp in seconds, stored with optional server/session timezone. Map to Instant or ZonedDateTime. |
DateTime64(n) | Sub-second precision (n = 0–9). Same timezone behavior as DateTime. |
DateTime and DateTime64 values are stored as UTC internally. String representation depends on the server timezone and any timezone defined on the column. When the column has no explicit timezone, the server timezone is used for both reading and writing string values.
session_timezone is an experimental setting that can invalidate partition pruning on DateTime columns. Do not use it in production. Convert timestamps to UTC in your application layer instead, or pass them as Unix epoch values using fromUnixTimestamp64Nano() to avoid any ambiguity.
Type modifiers
Two modifiers wrap other types and must be handled by your integration:
Nullable(T)— the column may contain NULL. Affects the type name returned in metadata (e.g.,Nullable(Int32)). Strip the wrapper when mapping to target types.LowCardinality(T)— a dictionary-encoded form of T, used for performance. Treat identically to the underlying type for all practical purposes.
Complex types
| ClickHouse Type | Behavior |
|---|---|
Array(T) | Nested arrays supported. Via JDBC, returned as java.sql.Array. Via HTTP, returned as JSON arrays. |
Map(K, V) | Key-value pairs. Via HTTP, returned as a JSON object. |
Tuple(T1, T2, ...) | Fixed-length heterogeneous sequence. Via HTTP, returned as a JSON array. |
Enum8, Enum16 | Returned as strings by default. Can be read as their underlying integer. |
UUID | Returned as a string in xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx format over HTTP. |
IPv4, IPv6 | Returned as dotted-decimal / colon-hex strings over HTTP. |
JSON | Returned as a JSON string. Supported in newer ClickHouse versions. |
AggregateFunction columns hold binary-encoded aggregate state. They are not human-readable and should generally be excluded from column listings exposed to end users.
Query patterns
Use parameterized queries
Never concatenate user input into SQL strings. ClickHouse supports named query parameters that prevent SQL injection and improve query plan reuse.
HTTP API:
Parameters are declared as {name:type} in the query and passed as param_name HTTP parameters.
JDBC:
Assign a query_id to every query
Set a deterministic query_id on each request. This lets you trace queries in system.query_log, cancel runaway queries, and implement idempotent retry logic.
HTTP API:
JDBC:
If you retry a query after a timeout, reuse the same query_id. ClickHouse will return the result of the already-running query rather than executing it twice.
Choose the right output format
ClickHouse supports dozens of output formats. Choose based on what your integration needs:
| Format | Use when |
|---|---|
JSONEachRow | Streaming row-by-row JSON; easy to parse incrementally |
JSONCompact | Compact JSON with column names in a header; smaller than full JSON |
CSV / TSV | Interoperability with spreadsheets and generic tools |
Parquet | Columnar data exchange with data lakehouse tools |
RowBinary / Native | Maximum throughput; binary format requiring type-aware deserialization |
Arrow | In-memory columnar exchange with Apache Arrow-compatible tools |
For most connector use cases, JSONEachRow is the best default: it is easy to stream and parse without buffering the full response.
Enable compression
LZ4 compression over HTTP significantly reduces network transfer for large result sets and inserts, at negligible CPU cost. Enable it by sending the Accept-Encoding header:
Paginate large result sets
ClickHouse does not have native cursor-based pagination. Use LIMIT / OFFSET for small datasets, or keyset pagination for large ones:
For very large exports, prefer streaming: consume JSONEachRow output incrementally rather than paginating.
Identifiers are case-sensitive
ClickHouse database names, table names, column names, and most function names are case-sensitive. A column defined as userId is different from userid. BI tools and ORMs that auto-capitalize identifiers will get Unknown column or Unknown function errors. Establish a lowercase or snake_case naming convention for all ClickHouse objects in integrations you control, and document this expectation for users creating tables.
64-bit integers in JSON output
If your integration consumes ClickHouse HTTP responses in JavaScript or TypeScript, note that Int64 and UInt64 values in JSON output are sent as numbers by default. JavaScript's JSON.parse() silently loses precision for integers beyond 2^53. Add output_format_json_quote_64bit_integers=1 to query parameters to receive them as quoted strings, then parse with a BigInt-aware library:
Set query timeouts
Protect your integration from runaway queries. Pass max_execution_time (seconds) as a query setting:
Writing data
Use batch inserts
Always batch rows into a single INSERT rather than sending one row per request. The HTTP API accepts INSERT data as the request body:
Aim for batches of 10,000–100,000 rows per request for optimal throughput. Smaller batches are the single most common production failure mode for connector-driven ingest: every INSERT creates a new on-disk data part, and ClickHouse merges parts asynchronously. If inserts arrive faster than merges complete, the part count crosses a threshold (default 300 active parts per partition) and ClickHouse raises Too many parts errors. Larger batches increase memory pressure. Never issue INSERT more than 1–2 times per second per table.
JDBC batch inserts
Async insert behavior
async_insert=1 buffers inserts server-side and flushes in bulk. Two important behaviors to understand:
- With
wait_for_async_insert=0(fire-and-forget), the server returns HTTP 200 immediately — before the data is validated or written. If a row has a type error, the entire buffered batch is silently dropped with no error surfaced to the client. Usewait_for_async_insert=1in latency-tolerant pipelines to get synchronous error feedback. - The
SETTINGSclause must appear beforeVALUESin an INSERT statement.INSERT INTO t SETTINGS async_insert=1 VALUES (?,?)works;INSERT INTO t VALUES (?,?) SETTINGS async_insert=1does not — the server fails to detect async-insert mode and parts accumulate rapidly.
JDBC PreparedStatement batches
In JDBC driver versions 0.8.6 and above, reusing a PreparedStatement across multiple executeBatch() calls re-inserts data from all previous batches due to a batch buffer not being cleared between executions. Create a new PreparedStatement instance for each batch cycle:
Materialized views inflate row counts
After creating a Materialized View on a source table, the rows_affected count returned by an INSERT includes rows written to both the source table and all MV target tables. An insert of 1,000 rows returns 2,000 rows affected with one MV. Row-count-based validation in ETL pipelines will fail. Use SELECT count() FROM source_table WHERE ... post-insert for validation instead.
Mutations are asynchronous
ALTER TABLE ... DELETE and ALTER TABLE ... UPDATE are asynchronous mutations. They return success immediately but run in the background as data parts are rewritten. A SELECT immediately after a DELETE may still return the deleted rows. For upsert patterns, use ReplacingMergeTree or CollapsingMergeTree instead of mutations. Poll system.mutations WHERE is_done = 0 if you need to wait for completion.
ALTER TABLE DELETE also does not clear deduplication checksums. If you delete rows and retry the same insert with the same content, ClickHouse will silently deduplicate and re-insert nothing. Use a new insert_deduplication_token value when re-inserting after deletions.
ClickHouse has no ACID transactions
Do not rely on BEGIN / COMMIT / ROLLBACK for data consistency. ClickHouse is append-optimized. Design writes to be idempotent:
- Use
insert_deduplication_tokento make retries safe — ClickHouse will deduplicate inserts with the same token within a configurable window:
- Use
ReplacingMergeTreefor upsert semantics — the engine merges duplicate primary keys, keeping the row with the highest version value.
JSON inserts and DEFAULT expressions
When inserting JSON with input_format_skip_unknown_fields=1, columns absent from the payload receive the type default (0, empty string) — not the column's DDL DEFAULT expression. If your schema uses DEFAULT now() or DEFAULT generateUUIDv4(), those expressions are only invoked when you also set input_format_defaults_for_omitted_fields=1:
Error handling
HTTP 200 does not mean success
ClickHouse begins streaming results immediately and sends the 200 OK status header before it knows whether the query will succeed. If an error occurs mid-stream, the error is appended to the response body — but the HTTP status remains 200.
This also affects inserts: a large POST that triggers a server-side timeout returns HTTP 200 OK with X-ClickHouse-Exception-Code: 209 in the headers and no data committed.
Your integration must:
- Check the
X-ClickHouse-Exception-Coderesponse header on every response, not just non-200s - Scan the response body for
Code: NNN. DB::Exception:if you are processing it as a stream - For DDL and short queries where latency is acceptable, add
wait_end_of_query=1to buffer the response server-side and get a reliable error status
HTTP status codes
| Status | Meaning |
|---|---|
| 200 | Query reached the server — check body and headers for errors |
| 400 | Bad request (malformed query, invalid parameter) |
| 401 | Authentication failed |
| 403 | Permission denied |
| 404 | Database or table not found |
| 500 | Server-side error (query execution failure, OOM, etc.) |
ClickHouse returns error details in the response body and as X-ClickHouse-Exception-Code and X-ClickHouse-Summary HTTP headers. Error format:
The numeric code (60) is stable across releases and suitable for programmatic handling.
Retry strategy
Retry on:
- Network-level errors (connection refused, timeout)
- HTTP 500 where the exception code indicates a transient condition (e.g., server overload)
Do not retry on:
- HTTP 400 (bad query — retrying won't help)
- HTTP 403 (wrong permissions — retrying won't help)
Use exponential backoff with jitter. Reuse the same query_id on retries for INSERT operations so ClickHouse can deduplicate.
Handling streaming errors
When using streaming output formats (e.g., JSONEachRow), ClickHouse may have already started writing rows before encountering an error. Errors in this case are appended at the end of the response stream rather than returned as an HTTP 500. Always read the full stream and check for a trailing error block:
ClickHouse Cloud considerations
Connection endpoints
ClickHouse Cloud service endpoints follow this pattern:
Where cloud is aws, gcp, or azure. Always use port 8443 (HTTPS) — port 8123 is not available on Cloud.
Auto-pause and connection retries
ClickHouse Cloud services on the development tier may auto-pause after a period of inactivity. An initial connection after a pause may take a few seconds to respond. Design your integration to retry connection attempts with a short backoff before surfacing an error to the user.
ClickHouse Cloud API for programmatic management
If your integration needs to enumerate or provision Cloud services (rather than query data), use the ClickHouse Cloud API. It is separate from the query interface and uses API key authentication.
Testing your integration
Test against both OSS and ClickHouse Cloud
Behavioral differences between self-managed ClickHouse and ClickHouse Cloud are minimal for most integrations, but test both. Specifically:
- ClickHouse Cloud always requires TLS
- Some system table columns may differ between versions
- Auto-pause behavior is Cloud-only
Cover edge-case data types
Most integration bugs are found with edge cases. Explicitly test:
Nullablecolumns — verify NULLs round-trip correctlyUInt64values nearLong.MAX_VALUEFixedString— verify zero-byte strippingDateTime64with sub-second precision and non-UTC timezones- Arrays and Maps — verify nested types survive serialization
- Empty result sets and single-row result sets
Use system.query_log to verify behavior
system.query_log is written asynchronously with a flush interval of approximately 7.5 seconds. Do not query it immediately after a test — add a small delay or poll with retry before expecting entries to appear.
After running your integration's test suite, inspect system.query_log to verify:
- Queries are attributed to your integration's
User-Agent - No unexpected full-table scans (check
read_rowsand whether the primary key is used) - Insert queries have the expected
written_rows
Implementation examples
HTTP API — Python connector skeleton
A minimal pattern for a Python-based connector that queries ClickHouse and streams results:
JDBC — Java BI connector skeleton
A minimal pattern for a JDBC-based BI connector with connection pooling and query tagging:
MCP integrations
For AI agent and LLM-powered integrations, see the dedicated MCP integration guide. It covers when to use MCP vs HTTP/JDBC, the ClickHouse Cloud built-in MCP server, the open-source server for self-hosted deployments, and best practices for building a custom MCP server.
Integration development checklist
- Integration uses HTTPS / TLS for all connections
- Credentials are read from environment variables or a secrets manager, not hardcoded
- A dedicated ClickHouse user with minimal permissions is used
-
User-Agentheader identifies the integration by name and version - Every query carries a
query_idfor traceability and idempotent retries - User-facing inputs pass through parameterized queries, not string concatenation
- Schema discovery queries exclude
systemandinformation_schemadatabases -
NullableandLowCardinalitytype wrappers are stripped before type mapping -
UInt64and larger unsigned integers are mapped toBigIntegeror equivalent -
FixedStringvalues are stripped of trailing null bytes before display - Bulk inserts use batches of 10,000–100,000 rows
-
insert_deduplication_tokenis set for retry-safe inserts -
X-ClickHouse-Exception-Codeheader is checked on every response (HTTP 200 does not guarantee success) - Streaming responses are fully consumed and checked for trailing error blocks
- Integration does not rely on JDBC transactions for consistency
- Mutations (DELETE/UPDATE) are not used for high-frequency updates; ReplacingMergeTree used for upsert patterns instead
- If using JavaScript/TypeScript:
output_format_json_quote_64bit_integers=1set to prevent Int64 precision loss - Table and column names use lowercase/snake_case (identifiers are case-sensitive)
- JDBC: new
PreparedStatementcreated per batch (reuse causes duplicate inserts in driver 0.8.6+) - Integration tested against both ClickHouse OSS and ClickHouse Cloud
- Edge-case types (Nullable, UInt64, DateTime64, Array, Map) covered in tests
- If building an MCP server (see MCP integration guide for details):
- Remote server uses Streamable HTTP transport and stateless request handlers
- All tools annotated with
readOnlyHint: trueunless writes are explicitly required - Server-side enforcement: read-only ClickHouse user + SQL validation, not just
readOnlyHint -
max_execution_time,max_result_rows, andmax_bytes_to_readenforced on the ClickHouse user or per-query - Tool descriptions guide agents toward aggregating queries; warn against full table scans
-
query_idandlog_commentset on every MCP-issued query for observability - Prompt injection mitigations in place for result data flowing back to the LLM