ClickHouse Log Shipping: Standard & Delayed

Log shipping is a method ClickHouse uses to replicate data across different instances or clusters to ensure high availability, fault tolerance, and disaster recovery. Delayed log shipping extends this concept by introducing a deliberate lag in replication, providing a window for recovery from operational mishaps such as accidental data deletions or corruptions. Here's how both are implemented in ClickHouse:

Log Shipping in ClickHouse

Log shipping in ClickHouse is primarily achieved through the use of ReplicatedMergeTree table engines, which automatically handle data replication between replicas within a cluster. This process involves several key components and steps:

ZooKeeper: ClickHouse uses Apache ZooKeeper to coordinate replication, maintain a consistent state across replicas, and store metadata about the data parts and their replication status.
Table Engine: To enable log shipping, tables must be created with the ReplicatedMergeTree engine. This engine requires specifying a ZooKeeper path and a replica name as part of the table's creation statement.
Data Parts: Data in ReplicatedMergeTree tables is divided into parts. Each part is a set of rows that are stored together. When data is inserted into a table, it is first written to a log (a queue of changes) in ZooKeeper.

Replication: The replicas continuously watch the ZooKeeper log for new entries. When a new entry is detected, each replica fetches the data part from the log or directly from another replica, ensuring that all replicas eventually hold the same data.

 CREATE TABLE replicated_table
 (
     date Date,
     event_name String,
     event_count Int32
 )
 ENGINE = ReplicatedMergeTree('/path/to/zookeeper/tables/replicated_table', '{replica}')
 PARTITION BY toYYYYMM(date)
 ORDER BY (date, event_name);

Delayed Log Shipping in ClickHouse

Delayed log shipping is not natively labeled as such in ClickHouse documentation but can be implemented through the manipulation of settings that control replication timing and behaviors. Here's how you can achieve delayed log shipping:

Replica Delay Configuration: You can configure a replica to delay applying the replicated log entries by setting the replica_delay_for_remote_commands parameter. This setting introduces a specified delay before executing the fetched log entry, effectively delaying the replication.
Manual Delay: Another approach to implementing delayed log shipping involves manually managing the replication process. This can be done by temporarily suspending the replication process or adjusting the fetching of data parts from other replicas, allowing for a controlled delay.
Use of Buffer Tables: By using ClickHouse's Buffer engine, you can temporarily store incoming data before it's flushed to the ReplicatedMergeTree table. This approach can mimic delayed replication by providing a buffer period during which data is not immediately replicated to other replicas.

Considerations for Delayed Log Shipping

Data Recovery Window: The delay introduced provides a window to recover from accidental data deletions or corruptions before the erroneous data is replicated to the delayed replica.
Operational Overhead: Implementing and managing delayed log shipping can introduce additional complexity in managing replication settings and monitoring the replication delay.
Performance Impact: The delayed replication might lead to temporary data inconsistency between replicas, which should be considered in applications requiring strict data consistency guarantees.

While ClickHouse's replication features natively support log shipping through the ReplicatedMergeTree family of table engines, implementing delayed log shipping requires careful configuration and operational management to ensure it meets your data recovery and availability needs.

How Log Shipping and Delayed Log Shipping are implemented in ClickHouse?

Log Shipping in ClickHouse

Delayed Log Shipping in ClickHouse

Considerations for Delayed Log Shipping

ClickHouse Blogs in ChisaDATA