Kafka-connect sink task ignores file offset storage property

Tags:

I'm experiencing quite weird behavior working with Confluent JDBC connector. I'm pretty sure that it's not related to Confluent stack, but to Kafka-connect framework itself.

So, I define offset.storage.file.filename property as default /tmp/connect.offsets and run my sink connector. Obviously, I expect connector to persist offsets in the given file (it doesn't exist on file system, but it should be automatically created, right?). Documentation says:

offset.storage.file.filename The file to store connector offsets in. By storing offsets on disk, a standalone process can be stopped and started on a single node and resume where it previously left off.

But Kafka behaves in completely different manner.

It checks if the given file exists.
It it's not, Kafka just ignores it and persists offsets in Kafka topic.
If I create given file manually, reading fails anyway (EOFException) and offsets are being persisted in topic again.

Is it a bug or, more likely, I don't understand how to work with this configurations? I understand difference between two approaches to persist offsets and file storage is more convenient for my needs.

611

asked Feb 06 '17 16:02

bsiamionau

2 Answers

The offset.storage.file.filename is only used in source connectors. It is used to place a bookmark on the input data source and remember where it stopped reading it. The created file contains something like the file line number (for a file source) or a table row number (for jdbc source or databases in general).

When running Kafka Connect in distributed mode, this file is replaced by a Kafka topic named by default connect-offsets which should be replicated in order to tolerate failures.

As far as sink connectors are concerned, no matter which plugin or mode (standalone/distributed) is used, they all store where they last stopped reading their input topic in an internal topic named __consumer_offsets like any Kafka consumers. This allows to use traditionnal tools like kafka-consumer-groups.sh command-line tools how the much the sink connector is lagging.

The Confluent Kafka replicator, despite being a source connector, is probably an exception because it reads from a remote Kafka and may use a Kafka consumer.

I agree than documentation is not clear, this setting is required whatever the connector type is (source or sink), but it is only used on by source connectors. The reason behind this design decision is that a single Kafka Connect worker (I mean a single JVM process) can run multiple connectors, potentially both source and sink connectors. Said differently, this setting is worker level setting, not a connector setting.

159

answered Sep 23 '22 00:09

G Quintana

The property offset.storage.file.filename only applies to workers of source connectors running in standalone mode. If you are seeing Kafka persist offsets in a Kafka topic for a source, you are running in distributed mode. You should be launching your connector with the provided script connect-standalone. There's a description of the different modes here. Instructions on running in the different modes are here.

answered Sep 21 '22 00:09

dawsaw

Related questions
                            
                                Should I define datasources in application or in app server?
                            
                                How do you interpret "java.lang.NoSuchMethodError: No direct method"?
                            
                                Android Tess-Two OCR unmappable character 'ﬁ'
                            
                                Lock-free variant of wait/notify
                            
                                How does getClass in Java work
                            
                                Apache HttpClient loadbalancing pooled connections
                            
                                Using @Transaction in JDBI / Dropwizard application
                            
                                Is there a class in java 8 that implements a "null-terminated" stream without having to roll my own?
                            
                                Best way to open and return a database connection in a Java application?
                            
                                Update versions contained in README on maven release
                            
                                Apache POI much quicker using HSSF than XSSF - what next?
                            
                                Java "Thread-2" without stack prevents termination
                            
                                Why wait() and notify() are not in special class? [closed]
                            
                                What is Law of Demeter?
                            
                                How do I map my java app logging events to corresponding cloud logging event levels in GCP Felexible non-compat App Engine?
                            
                                Java GUI support on Wayland
                            
                                Will "Rubber banding" resolve multiplayer interpolation stutter?
                            
                                Merging multiple identical Kafka Streams topics
                            
                                Send authorization header with every request in webview using okhttp in android
                            
                                Equivalent of Mockito any with not null constraint

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kafka-connect sink task ignores file offset storage property

Tags:

java

apache-kafka

apache-kafka-connect

bsiamionau

People also ask

2 Answers

G Quintana

dawsaw

Recent Activity

Donate For Us