Read path and compression offset map

Tags:

cassandra

I'm trying to understand Cassandra read path and can't get why do we need a compression offset map.

https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html

enter image description here

The partition index resides on disk and stores an index of all partition keys mapped to their offset.

The compression offset map stores pointers to the exact location on disk that the desired partition data will be found.

Why do we need both of them? Why can't partition index store pointers to exact location on disk?

I'm sorry for a stupid title, but that's what stackoverflow asked me, I couldn't use "Why do we need a compression offset map if we have a partition index?"

820

asked May 31 '18 19:05

1 Answers

The file is compressed in chunks. By default 64k of data is compressed, then next 64k etc. The offsets written in index file are that of the uncompressed data. This is because as its writing, it knows how many bytes have been written so far so uses that to mark whenever starting new partition. The compression offsets maps the compressed offsets and their uncompressed positions so it knows which chunk to start decompressing to get to the partition at some uncompressed offset from the index.

If a partition exists in the middle of a 64k compressed chunk, you need do decompress that entire chunk. You cannot start reading in the middle of it due to how the compression algorithms work. This is why in some situations it makes sense to decrease the chunk size as it would reduce the overhead of reading a tiny partition.

166

answered Nov 05 '22 17:11

Chris Lohfink

Related questions
                            
                                Cassandra on Windows: Fatal configuration error
                            
                                I'm Unable to connect using python Cassandra-Driver
                            
                                Storing schema less data in cassandra
                            
                                Adding an existing non-seed Cassandra node to the list of seeds
                            
                                Cassandra - advantages of custom type
                            
                                Cassandra storage internal
                            
                                Spark Cassandra connector filtering with IN clause
                            
                                Cassandra non counter family
                            
                                Performance difference between synchronous SELECT + INSERT vs INSERT ... IF NOT EXISTS in CQL?
                            
                                Using Datastax Java Driver to query a row as a JSON
                            
                                Cassandra DB. com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table person
                            
                                jemalloc shared library could not be preloaded to speed up memory allocations
                            
                                map<text, object> Cassandra, is it possible
                            
                                Search key of big partition in cassandra
                            
                                is not null or not equal clause in cassandra
                            
                                How to retrieve the column having datatype as "list" from the table of Cassandra?
                            
                                What is gc grace in Cassandra
                            
                                Does cassandra flush memtables on nodetool stopdaemon. If not what to do to avoid data loss
                            
                                gc.log file error when running cassandra
                            
                                Trying to use Guid from C# as primary key in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read path and compression offset map

Tags:

cassandra

MaxNevermind

People also ask

1 Answers

Chris Lohfink

Recent Activity

Donate For Us