Skip to content

Commit 37e2a55

Browse files
nizhikovNikita-tech-writer
authored andcommitted
IGNITE-14365 CDC Documentation (#9708)
Co-authored-by: Nikita Safonov <73828260+Nikita-tech-writer@users.noreply.github.com> (cherry picked from commit 18b47d9)
1 parent b1289f7 commit 37e2a55

4 files changed

Lines changed: 138 additions & 1 deletion

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one or more
2+
// contributor license agreements. See the NOTICE file distributed with
3+
// this work for additional information regarding copyright ownership.
4+
// The ASF licenses this file to You under the Apache License, Version 2.0
5+
// (the "License"); you may not use this file except in compliance with
6+
// the License. You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
= Change Data Capture
16+
17+
18+
== Overview
19+
Change Data Capture (link:https://en.wikipedia.org/wiki/Change_data_capture[CDC]) is a data processing pattern used to asynchronously receive entries that have been changed on the local node so that action can be taken using the changed entry.
20+
21+
WARNING: CDC is an experimental feature whose API or design architecture might be changed.
22+
23+
Below are some of the CDC use cases:
24+
25+
* Streaming changes in Warehouse;
26+
* Updating search index;
27+
* Calculating statistics (streaming queries);
28+
* Auditing logs;
29+
* Async interaction with extenal system: Moderation, business process invocation, etc.
30+
31+
Ignite implements CDC with the `ignite-cdc.sh` application and link:https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cdc/CdcConsumer.java#L56[Java API].
32+
33+
Below are the CDC application and the Ignite node integrated via WAL archive segments:
34+
35+
image:../../assets/images/integrations/CDC-design.svg[]
36+
37+
When CDC is enabled, the Ignite server node creates a hard link to each WAL archive segment in the special `db/cdc/\{consistency_id\}` directory.
38+
The `ignite-cdc.sh` application runs on a different JVM and processes newly archived link:native-persistence.adoc#_write-ahead_log[WAL segments].
39+
When the segment is fully processed by `ignite-cdc.sh`, it is removed. The actual disk space is free when both links (archive and CDC) are removed.
40+
41+
State of consumption is a pointer to the last processed event.
42+
Consumer can tell to `ignite-cdc.sh` to save the consumption state.
43+
On startup event processing will be continued from the last saved state.
44+
45+
== Configuration
46+
47+
=== Ignite Node
48+
49+
[cols="20%,45%,35%",opts="header"]
50+
|===
51+
|Name |Description | Default value
52+
| `DataStorageConfiguration#cdcEnabled` | Flag to enable CDC on the server node. | `false`
53+
| `DataStorageConfiguration#cdcWalPath` | Path to the CDC directory | `"db/wal/cdc"`
54+
| `DataStorageConfiguration#walForceArchiveTimeout` | Timeout to forcefully archive the WAL segment even it is not complete. | `-1` (disabled)
55+
|===
56+
57+
=== CDC Application
58+
59+
CDC is configured in the same way as the Ignite node - via the spring XML file:
60+
61+
* `ignite-cdc.sh` requires both Ignite and CDC configurations to start;
62+
* `IgniteConfiguration` is used to determine common options like a path to the CDC directory, node consistent id, and other parameters;
63+
* `CdcConfiguration` contains `ignite-cdc.sh`-specific options.
64+
65+
[cols="20%,45%,35%",opts="header"]
66+
|===
67+
|Name |Description | Default value
68+
| `lockTimeout` | Timeout to wait for lock acquiring. CDC locks directory on a startup to ensure there is no concurrent `ignite-cdc.sh` processing the same directory.
69+
| 1000 milliseconds.
70+
| `checkFrequency` | Amount of time application sleeps between subsequent checks when no new files available. | 1000 milliseconds.
71+
| `keepBinary` | Flag to specify if key and value of changed entries should be provided in link:../key-value-api/binary-objects.adoc[binary format]. | `true`
72+
| `consumer` | Implementation of `org.apache.ignite.cdc.CdcConsumer` that consumes entries changes. | null
73+
| `metricExporterSpi` | Array of SPI's to export CDC metrics. See link:../monitoring-metrics/new-metrics-system.adoc#_metric_exporters[metrics] documentation, also. | null
74+
|===
75+
76+
== API
77+
78+
=== `org.apache.ignite.cdc.CdcEvent`
79+
80+
Below is a single change of the data reflected by `CdcEvent`.
81+
82+
[cols="20%,80%",opts="header"]
83+
|===
84+
|Name |Description
85+
| `key()` | Key for the changed entry.
86+
| `value()` | Value for the changed entry. This method will return `null` if the event reflects removal.
87+
| `cacheId()` | ID of the cache where the change happens. The value is equal to the `CACHE_ID` from link:../monitoring-metrics/system-views.adoc#_CACHES[`SYS.CACHES`].
88+
| `partition()` | Partition of the changed entry.
89+
| `primary()` | Flag to distinguish if operation happens on the primary or a backup node.
90+
| `version()` | `Comparable` version of the changed entry. Internally, Ignite maintains ordered versions of each entry so any changes of the same entry can be sorted.
91+
|===
92+
93+
=== `org.apache.ignite.cdc.CdcConsumer`
94+
95+
The consumer of change events. It should be implemented by the user.
96+
[cols="20%,80%",opts="header"]
97+
|===
98+
|Name |Description
99+
| `void start(MetricRegistry)` | Invoked one-time at the start of the CDC application. `MetricRegistry` should be used to export the consumer-specific metrics.
100+
| `boolean onEvents(Iterator<CdcEvent> events)` | The main method that processes changes. When this method returns `true`, the state is saved on the disk. State points to the event next to the last read event. In case of any failure, consumption will continue from the last saved state.
101+
| `void stop()` | Invokes one-time at the stop of the CDC application.
102+
|===
103+
104+
== Metrics
105+
106+
`ignite-cdc.sh` uses the same SPI to export metrics as Ignite does.
107+
The following metrics are provided by the application (additional metrics can be provided by the consumer):
108+
|===
109+
|Name |Description
110+
| CurrentSegmentIndex | Index of the currently processing WAL segment.
111+
| CommittedSegmentIndex | Index of the WAL segment that contains the last committed state.
112+
| CommittedSegmentOffset | Committed offset in bytes inside the WAL segment.
113+
| LastSegmentConsumptionTime | Timestamp (in milliseconds) indicating the last segment processing start.
114+
| BinaryMetaDir | Binary meta-directory the application reads data from.
115+
| MarshallerDir | Marshaller directory the application reads data from.
116+
| CdcDir | The CDC directory the application reads data from.
117+
|===
118+
119+
== Logging
120+
121+
`ignite-cdc.sh` uses the same logging configuration as the Ignite node does. The only difference is that the log is written in the"ignite-cdc.log" file.
122+
123+
== Lifecycle
124+
125+
IMPORTANT: `ignite-cdc.sh` implements the fail-fast approach. It just fails in case of any error. The restart procedure should be configured with the OS tools.
126+
127+
1. Find the required shared directories. Take the values from the provided `IgniteConfiguration`.
128+
2. Lock the CDC directory.
129+
3. Load the saved state.
130+
4. Start the consumer.
131+
5. Infinitely wait for the newly available segment and process it.
132+
6. Stop the consumer in case of a failure or a received stop signal.

docs/_docs/persistence/native-persistence.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ The Native Persistence functionality is based on the following features:
3333
* Storing data partitions on disk
3434
* Write-ahead logging
3535
* Checkpointing
36+
* link:change-data-capture.adoc[Change Data Capture]
3637
* Usage of OS swap
3738
////
3839
*TODO: diagram: update operation + wal + checkpointing*
Lines changed: 4 additions & 0 deletions
Loading

modules/core/src/main/java/org/apache/ignite/cdc/CdcConsumer.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
* <li>Stop of the consumer {@link #stop()}.</li>
3636
* </ul>
3737
*
38-
* In case consumer implementation wants to user {@link IgniteLogger}, please, use, {@link LoggerResource} annotation:
38+
* In case consumer implementation wants to use {@link IgniteLogger}, please, use, {@link LoggerResource} annotation:
3939
* <pre>
4040
* public class ChangeDataCaptureConsumer implements ChangeDataCaptureConsumer {
4141
* &#64;LoggerResource

0 commit comments

Comments
 (0)