Forgive me I am just learning the Kafka
. I have encountered a word named commit-log
many times when I was reading the material of Kafka
. but still have no idea of what exactly it is. the mentioned link like below.
https://kafka.apache.org/documentation/#uses_commitlog
Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data.
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs,
https://kafka.apache.org/protocol.html#protocol_partitioning
Kafka is a partitioned system so not all servers have the complete data set. Instead recall that topics are split into a pre-defined number of partitions, P, and each partition is replicated with some replication factor, N. Topic partitions themselves are just ordered "commit logs" numbered 0, 1, ..., P.
What does commit-log means ? Any difference with the concept from DBMS? How to understand it ? Thanks.
A commit log is a record of transactions. It's used to keep track of what's happening, and help with e.g. disaster recovery - generally, all commits are written to the log before being applied, so transactions that were in flight when the server went down can be recovered and re-applied by checking the log.
Logs location Apache Kafka logs in the cluster are located at /var/log/kafka .
In Postgres, commit logs are called write ahead logs. Each write to a Postgres database must first be recorded in the write ahead log before the data is changed in either a table or an index. The first benefit is that it speeds up database writes. Writing to a commit log is relatively fast, even on disk.
Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.
Conceptually there's no difference between the "commit log" that Kafka provides and the commit log/transaction log/write ahead log that a DBMS uses: They're both about recording the changes made to something so that it can be replayed later.
In the case of a DBMS this replay will happen if the DB was not shut down cleanly and is necessary to ensure the DB resumes service in a consistent state. Importantly, in a DB this commit log is an implementation detail of the database and is not a concern of the database clients.
In a Kafka application this commit log is a first class concept. Subscribers to a topic can reconstruct the state of the application for themselves, if they want to (in effect, "replaying the log"). They can also react to particular events in the topic, and understand how a particular state was arrived at, neither of which is easy with a traditional DBMS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With