Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Connector for Oracle Database Source

I want to build a Kafka Connector in order to retrieve records from a database at near real time. My database is the Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 and the tables have millions of records. First of all, I would like to add the minimum load to my database using CDC. Secondly, I would like to retrieve records based on a LastUpdate field which has value after a certain date.

Searching at the site of confluent, the only open source connector that I found was the “Kafka Connect JDBC”. I think that this connector doesn’t have CDC mechanism and it isn’t possible to retrieve millions of records when the connector starts for the first time. The alternative solution that I thought is Debezium, but there is no Debezium Oracle Connector at the site of Confluent and I believe that it is at a beta version.

Which solution would you suggest? Is something wrong to my assumptions of Kafka Connect JDBC or Debezium Connector? Is there any other solution?

like image 740
Lefteris Souvleros Avatar asked Jun 27 '19 08:06

Lefteris Souvleros


People also ask

What is Kafka Connect Oracle?

Kafka Connect Oracle. kafka-connect-oracle is a Kafka source connector for capturing all row based DML changes from Oracle database and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution. Only committed changes are pulled from Oracle which are Insert,Update,Delete operations.

What are some examples of Kafka connectors?

A few examples include HDFS, File system, Database, etc. Kafka Connect is a predefined connector implementation of such common systems. There are two types of connectors, namely source connector and sink connector.

How do I set up streaming with Kafka connector?

Set the bootstrap server in your Kafka connector properties file to the endpoint for Streaming on port 9092. For example: For a list of endpoints for Streaming , see the Streaming section in API Reference and Endpoints. Authentication with the Kafka protocol uses auth tokens and the SASL/PLAIN mechanism.

What are the throttle limits for Kafka Connect configuration topics?

To ensure that the Kafka Connect configuration topics are being used for their intended purpose by the connectors, there are hard throttle limits of 50 kb/s and 50 rps in place for these topics. Set the bootstrap server in your Kafka connector properties file to the endpoint for Streaming on port 9092. For example:


2 Answers

For query-based CDC which is less efficient, you can use the JDBC source connector.


For log-based CDC I am aware of a couple of options however, some of them require license:

1) Attunity Replicate that allows users to use a graphical interface to create real-time data pipelines from producer systems into Apache Kafka, without having to do any manual coding or scripting. I have been using Attunity Replicate for Oracle -> Kafka for a couple of years and was very satisfied.

2) Oracle GoldenGate that requires a license

3) Oracle Log Miner that does not require any license and is used by both Attunity and kafka-connect-oracle which is is a Kafka source connector for capturing all row based DML changes from an Oracle and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution.

like image 143
Giorgos Myrianthous Avatar answered Nov 02 '22 18:11

Giorgos Myrianthous


We have numerous customers using IBM's IIDR (info sphere Data Replication) product to replicate data from Oracle databases, (as well as Z mainframe, I-series, SQL Server, etc.) into Kafka.

Regardless of which of the sources used, data can be normalized into one of many formats in Kafka. An example of an included, selectable format is...

https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/kcopauditavrosinglerow.html

The solution is highly scalable and has been measured to replicate changes into the 100,000's of rows per second.

We also have a proprietary ability to reconstitute data written in parallel to Kafka back into its original source order. So, despite data having been written to numerous partitions and topics , the original total order can be known. This functionality is known as the TCC (transactionally consistent consumer).

See the video and slides here... https://kafka-summit.org/sessions/exactly-once-replication-database-kafka-cloud/

like image 26
Shawn Avatar answered Nov 02 '22 17:11

Shawn