What does commit-log mean in Kafka?

Tags:

apache-kafka

Forgive me I am just learning the Kafka. I have encountered a word named commit-log many times when I was reading the material of Kafka. but still have no idea of what exactly it is. the mentioned link like below.

https://kafka.apache.org/documentation/#uses_commitlog

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data.

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs,

https://kafka.apache.org/protocol.html#protocol_partitioning

Kafka is a partitioned system so not all servers have the complete data set. Instead recall that topics are split into a pre-defined number of partitions, P, and each partition is replicated with some replication factor, N. Topic partitions themselves are just ordered "commit logs" numbered 0, 1, ..., P.

What does commit-log means ? Any difference with the concept from DBMS? How to understand it ? Thanks.

667

asked Jul 17 '17 08:07

Joe.wang

1 Answers

Conceptually there's no difference between the "commit log" that Kafka provides and the commit log/transaction log/write ahead log that a DBMS uses: They're both about recording the changes made to something so that it can be replayed later.

In the case of a DBMS this replay will happen if the DB was not shut down cleanly and is necessary to ensure the DB resumes service in a consistent state. Importantly, in a DB this commit log is an implementation detail of the database and is not a concern of the database clients.

In a Kafka application this commit log is a first class concept. Subscribers to a topic can reconstruct the state of the application for themselves, if they want to (in effect, "replaying the log"). They can also react to particular events in the topic, and understand how a particular state was arrived at, neither of which is easy with a traditional DBMS.

137

answered Sep 28 '22 02:09

Tom Bentley

Related questions
                            
                                kafka failed authentication due to: SSL handshake failed
                            
                                Kafka consumer exception and offset commits
                            
                                Rebalancing issue while reading messages in Kafka
                            
                                Apache Kafka message consumption when partitions outnumber consumers
                            
                                KafkaConsumer Java API subscribe() vs assign()
                            
                                Using Kafka Producer by different threads
                            
                                Kafka topic partitions to Spark streaming
                            
                                What does PLAINTEXT keyword means in Kafka configuration?
                            
                                Is Kafka ready for production use?
                            
                                Does Kafka python API support stream processing?
                            
                                'make' command not found in docker container
                            
                                In Spring Kafka, do I need to add the @EnableKafka annotation to my application?
                            
                                Kafka Streams with Spring Boot
                            
                                Kafka Broker doesn't find cluster id and creates new one after docker restart
                            
                                Is a web frontend producing directly to a Kafka broker a viable idea?
                            
                                Spark streaming with Kafka - createDirectStream vs createStream
                            
                                Delay in Consumer consuming messages in Apache Kafka
                            
                                How to stop spark streaming when the data source has run out
                            
                                Kafka is giving: "The group member needs to have a valid member id before actually entering a consumer group"
                            
                                How does kafka handle network partitions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With