Cdc

Read(14) Label: debezium, cdc, mysql, postgresql,

1.  CdcCli external library offers Cdc external library functions. The directory containing files of Cdc external library is: installation directory\esProc\extlib\CdcCli.

 

2.  A JRE version 1.7 or above is required. To use a custome JRE, users need to configure java_home in config.txt under installation directory\esProc\bin. Just skip this step if the JRE version is adequate.

 

3.  To use the CdcCli external library, you need to have the data synchronization tool.

The data synchronization tool is located in DebeziumCli directory ([installation directory]\esProc\extlib\DebeziumCli). This tool works through Debezium (support 3.0.0 version or above) to store the changes resulted from data addition, deletion or modification performed on the database (support MySQL and PostgreSQL for the time being) in the multizone composite table.

The Raqsoft core jars for this external library are [installation directory]\esProc\extlib\CdcCli \scu-cdc-cli-2.10.jar and [installation directory]\esProc\extlib\DebeziumCli \scu-debezium-2.10.jar.

Besides the two core jars, there are also the following files in DebeziumCli directory:

antlr4-runtime-4.10.1.jar

connect-api-3.8.0.jar

connect-json-3.8.0.jar

connect-runtime-3.8.0.jar

debezium-api-3.0.0.Final.jar

debezium-connector-binlog-3.0.0.Final.jar

debezium-connector-mysql-3.0.0.Final.jar

debezium-connector-postgres-3.0.0.Final.jar

debezium-core-3.0.0.Final.jar

debezium-ddl-parser-3.0.0.Final.jar

debezium-embedded-3.0.0.Final.jar

debezium-storage-file-3.0.0.Final.jar

debezium-storage-kafka-3.0.0.Final.jar

fastjson-1.2.76.jar

jackson-annotations-2.16.2.jar

jackson-core-2.16.2.jar

jackson-databind-2.16.2.jar

jackson-module-afterburner-2.16.2.jar

kafka-clients-3.8.0.jar

log4j-1.2.17.jar

mysql-binlog-connector-java-0.31.0.jar

mysql-connector-j-9.0.0.jar

postgresql-42.2.14.jar

slf4j-api-1.7.30.jar

slf4j-reload4j-1.7.36.jar

Note: The third-party jars are encapsulated in the compression package and users can choose appropriate ones for specific scenarios.

There are another two other files:

log4j.properties

start_server.bat

 

4. Here is how to perform data synchronization:

(1) Modify the java path in start_server.bat. jre11 or a higher version is required. Below is the content of start_server.bat:

set EXECJAVA="D:\Java\jdk-11\bin\java"

%EXECJAVA% -Xms128m -Xmx1024m  -cp .;*;../../lib/*; com.scudata.lib.debezium.DebeziumCollect %1 %2

(2) In a directory on the same level as start_server.bat, create the configuration file mysql.properties or postgres.properties as needed. Below is content of the configuration file:

# Custom, unique connector name

name=testName

# Connector class name, which varies with different databases. Here we use MySQL as an example

connector.class=io.debezium.connector.mysql.MySqlConnector

# File-based offset persistence

offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore

offset.storage.file.filename=F:/tmp/dbz/storage/mysql_offsets.dat

offset.flush.interval.ms=60000

# Database server name

database.server.name=myDatabase

# The current database’s numeric ID, which is random by default

database.server.id=59059

# Database IP address and port number

database.hostname=localhost

database.port=3306

# Database connection username and password

database.user=username

database.password=password

# Database name

database.dbname =myDBname

# Set time zone

database.connectionTimeZone=GMT+8

# List of monitored data tables separated by comma

table.include.list=myDBname.table1,myDBname.table2

# Name of custom table level topic

topic.prefix=sync

# Replication slot name

slot.name=slotName

# Specify the location holding the database frame history storage

schema.history.internal=io.debezium.storage.file.history.FileSchemaHistory

schema.history.internal.file.filename=F:/tmp/dbz/storage/schemahistory.dat 

# Set snapshot mode

snapshot.mode=always

# Use the third-party interface to handle data type

decimal.handling.mode=double

numeric.handling.mode=numeric

# Data conversion, for which time type format is specified

converters=scudataConverter

scudataConverter.type=com.scudata.lib.debezium.ScudataConverter

scudataConverter.schema.name=io.debezium.scudata.type.scudataConverter

scudataConverter.format.date=yyyy-MM-dd

scudataConverter.format.time=HH:mm:ss

scudataConverter.format.datetime=yyyy-MM-dd HH:mm:ss

scudataConverter.format.timestamp=yyyy-MM-dd HH:mm:ss

scudataConverter.format.timestamptz=yyyy-MM-dd HH:mm:ss

scudataConverter.format.timestamp.zone=GMT+8

scudataConverter.format.timestamptz.zone=UTC

(3) Start MySQL or PostgreSQL database.

(4) Start data synchronization tool. The startup command and parameter explanation are as follows:

start_server.bat parameter 1 parameter 2

Parameter 1: The directory holding data files to be synced, which can be an absolute path or a relative path.

When data sync is finished, a dirout directory will be automatically generated in this path for storing the multizone composite table. The parameter value is the current directory by default, and this parameter cannot be omitted alone;

Parameter 2: The .properties configuration file to be connected, which is mysql.properties by default.

For example, start_server.bat F:/tmp/myTest postgres.properties means that configuration file postgres.properties is used and files generated from data sync will be stored in the specified directory F:/tmp/myTest/dirout.

(5) Then perform data addition, deletion or modification on data in the database, the updates will be recorded in the multizone composite table generated in dirout directory:

“postgres.employee_2024_12.ctx” records all data changes of postgre database’s employee table in December, 2024. A new multizone composite table will be created to record changes in the next month.

 

4.  The external library functions working on the multizone composite tables generated from data sync are cdc_collect() and cdc_merge(). Look them up in Help-Function referenceto find their uses.