Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of Cassandra's commit log?

Tags:

cassandra

Please some one clarify for me to understand Commit Log and its use.

In Cassandra, while writing to Disk is the commit log the first entry point or MemTables.

If Memtables is what is getting flushed to disk, what is the use of Commit log, is the only purpose of commit log is to server sync issues if a data node is down?

like image 493
Satheesh Avatar asked Jan 04 '16 14:01

Satheesh


People also ask

Where does Cassandra store data files?

When a write occurs, Cassandra stores the data in a memory structure called memtable, and to provide configurable durability, it also appends writes to the commit log on disk. The commit log receives every write made to a Cassandra node, and these durable writes survive permanently even if power fails on a node.

What is Memtable in Cassandra?

Memtable is an in-memory cache with content stored as key/column. Memtable data are sorted by key. Each ColumnFamily has a separate Memtable and retrieve column data from the key. Cassandra writes are first written to the CommitLog. After writing to CommitLog, Cassandra writes the data to memtable.

Where are Memtables stored in Cassandra?

The memtable is an in memory data structure which can be kept on or offheap for each table on each node. The memtables periodically flush to new sstables which are merged with the memtable for reads. The commitlog provides durability for the memtable until its flushed.

What is CommitLog?

A commit log is a record of transactions. It's used to keep track of what's happening, and help with e.g. disaster recovery - generally, all commits are written to the log before being applied, so transactions that were in flight when the server went down can be recovered and re-applied by checking the log.


2 Answers

You can think of the commit log as an optimization, but Cassandra would be unusably slow without it. When MemTables get written to disk we call them SSTables. SSTables are immutable, meaning once Cassandra writes them to disk it does not update them. So when a column changes Cassandra needs to write a new SSTable to disk. If Cassandra was writing these SSTables to disk on every update it would be completely IO bound and very slow.

So Cassandra uses a few tricks to get better performance. Instead of writing SSTables to disk on every column update, it keeps the updates in memory and flushes those changes to disk periodically to keep the IO to a reasonable level. But this leads to the obvious problem that if the machine goes down or Cassandra crashes you would lose data on that node. To avoid losing data, in addition to keeping recent changes in memory, Cassandra writes the changes to its CommitLog.

You may be asking why is writing to the CommitLog any better than just writing the SSTables. The CommitLog is optimized for writing. Unlike SSTables which store rows in sorted order, the CommitLog stores updates in the order which they were processed by Cassandra. The CommitLog also stores changes for all the column families in a single file so the disk doesn't need to do a bunch of seeks when it is receiving updates for multiple column families at the same time.

Basically writting the CommitLog to the disk is better because it has to write less data than writing SSTables does and it writes all that data to a single place on disk.

Cassandra keeps track of what data has been flushed to SSTables and is able to truncate the Commit log once all data older than a certain point has been written.

When Cassandra starts up it has to read the commit log back from that last known good point in time (the point at which we know all previous writes were written to an SSTable). It re-applies the changes in the commit log to its MemTables so it can get into the same state when it stopped. This process can be slow so if you are stopping a Cassandra node for maintenance it is a good idea to use nodetool drain before shutting it down which will flush everything in the MemTables to SSTables and make the amount of work on startup a lot smaller.

like image 159
psanford Avatar answered Sep 18 '22 12:09

psanford


The write path in cassandra works like this:

Cassandra Node ---->Commitlog-----------------> Memtable                          |                       |                          |                       |                          |---> Periodically      |---> Periodically                               sync to  disk          flush to SSTable 

Memtable and CommitLog are NOT written (kind of) in parallel. Write to CommitLog must be finished before starting to write to Memtable. Related source code stack is:

org.apache.cassandra.service.StorageProxy.mutateMV:mutation.apply-> org.apache.cassandra.db.Mutation.apply:Keyspace.open(keyspaceName).apply-> org.apache.cassandra.db.Keyspace.apply-> org.apache.cassandra.db.Keyspace.applyInternal{     Tracing.trace("Appending to commitlog");     commitLogPosition = CommitLog.instance.add(mutation)     ...     Tracing.trace("Adding to {} memtable",...     ...     upd.metadata().name(...);     ...     cfs.apply(...);     ... } 

The purpose of the commitlog is to be able to recreate the memtable after a node crashes or gets rebooted. This is important, since the memtable only gets flushed to disk when it's 'full' - meaning the configured memtable size is exceded - or the flush is performed by nodetool or opscenter. So the data in memtable is not persisted directly.

Having said that, a good thing before rebooting a node is to call "nodetool flush" to make sure your memtable are persisted. This also will reduce playback time of the commitlog after the node comes up again.

like image 34
HashtagMarkus Avatar answered Sep 21 '22 12:09

HashtagMarkus