I want to build a Kafka Connector in order to retrieve records from a database at near real time. My database is the <code>Oracle Database 11g</code> Enterprise Edition Release 11.2.0.3.0 and the tables have millions of records. First of all, I would like to add the minimum load to my database using CDC. Secondly, I would like to retrieve records based on a LastUpdate field which has value after a certain date. Searching at the site of confluent, the only open source connector that I found was the <code>“Kafka Connect JDBC”</code>. I think that this connector doesn’t have CDC mechanism and it isn’t possible to retrieve millions of records when the connector starts for the first time. The alternative solution that I thought is Debezium, but there is no Debezium Oracle Connector at the site of Confluent and I believe that it is at a beta version. Which solution would you suggest? Is something wrong to my assumptions of Kafka Connect JDBC or Debezium Connector? Is there any other solution?

For query-based CDC which is less efficient, you can use the JDBC source connector. <hr> For log-based CDC I am aware of a couple of options however, some of them require license: 1) Attunity Replicate that allows users to use a graphical interface to create real-time data pipelines from producer systems into Apache Kafka, without having to do any manual coding or scripting. I have been using Attunity Replicate for Oracle -> Kafka for a couple of years and was very satisfied. 2) Oracle GoldenGate that requires a license 3) Oracle Log Miner that does not require any license and is used by both Attunity and kafka-connect-oracle which is is a Kafka source connector for capturing all row based DML changes from an Oracle and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution.

Kafka Connector for Oracle Database Source

Tags:

apache-kafka

oracle11g

cdc

apache-kafka-connect

oracle-cdc

I want to build a Kafka Connector in order to retrieve records from a database at near real time. My database is the Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 and the tables have millions of records. First of all, I would like to add the minimum load to my database using CDC. Secondly, I would like to retrieve records based on a LastUpdate field which has value after a certain date.

Searching at the site of confluent, the only open source connector that I found was the “Kafka Connect JDBC”. I think that this connector doesn’t have CDC mechanism and it isn’t possible to retrieve millions of records when the connector starts for the first time. The alternative solution that I thought is Debezium, but there is no Debezium Oracle Connector at the site of Confluent and I believe that it is at a beta version.

Which solution would you suggest? Is something wrong to my assumptions of Kafka Connect JDBC or Debezium Connector? Is there any other solution?

740

asked Jun 27 '19 08:06

Lefteris Souvleros

2 Answers

For query-based CDC which is less efficient, you can use the JDBC source connector.

For log-based CDC I am aware of a couple of options however, some of them require license:

1) Attunity Replicate that allows users to use a graphical interface to create real-time data pipelines from producer systems into Apache Kafka, without having to do any manual coding or scripting. I have been using Attunity Replicate for Oracle -> Kafka for a couple of years and was very satisfied.

2) Oracle GoldenGate that requires a license

3) Oracle Log Miner that does not require any license and is used by both Attunity and kafka-connect-oracle which is is a Kafka source connector for capturing all row based DML changes from an Oracle and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution.

143

answered Nov 02 '22 18:11

Giorgos Myrianthous

We have numerous customers using IBM's IIDR (info sphere Data Replication) product to replicate data from Oracle databases, (as well as Z mainframe, I-series, SQL Server, etc.) into Kafka.

Regardless of which of the sources used, data can be normalized into one of many formats in Kafka. An example of an included, selectable format is...

https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/kcopauditavrosinglerow.html

The solution is highly scalable and has been measured to replicate changes into the 100,000's of rows per second.

We also have a proprietary ability to reconstitute data written in parallel to Kafka back into its original source order. So, despite data having been written to numerous partitions and topics , the original total order can be known. This functionality is known as the TCC (transactionally consistent consumer).

See the video and slides here... https://kafka-summit.org/sessions/exactly-once-replication-database-kafka-cloud/

answered Nov 02 '22 17:11

Shawn

Related questions
                            
                                No more data to read from socket
                            
                                to_char function issue with date passing in the format of 'dd-mon-yyyy'
                            
                                How can <cfqueryparam> affect performance for constants and null values?
                            
                                Get XML element value in Oracle DBMS_XMLDOM package
                            
                                ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
                            
                                Performance considerations for temporary data in Oracle
                            
                                Error casting T4CConnection to OracleConnection
                            
                                Disconnect all user connections on an RDS Oracle DB
                            
                                Meaning of Oracle's dump(systimestamp) bytes
                            
                                Multiple updates in execute immediate
                            
                                Oracle TO_DATE function
                            
                                SQL - Get max value of existing field as stand alone field in same data set
                            
                                TOO_MANY_ROWS raised, but variable still gets a value
                            
                                JDBC Thin connection string in Oracle uses both colon and forward slash
                            
                                Sql Only String search
                            
                                How to improve performance of a simple select query in oracle
                            
                                oracle forms not showing all fields
                            
                                How do I check if a sequence exists or not in Oracle 11g?
                            
                                Underscore is not working in oracle like clause
                            
                                Why does Dapper throw an OracleException when i run a query or command with parameters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With