I'm trying to understand Cassandra read path and can't get why do we need a compression offset map.
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html
The partition index resides on disk and stores an index of all partition keys mapped to their offset.
The compression offset map stores pointers to the exact location on disk that the desired partition data will be found.
Why do we need both of them? Why can't partition index store pointers to exact location on disk?
I'm sorry for a stupid title, but that's what stackoverflow asked me, I couldn't use "Why do we need a compression offset map if we have a partition index?"
The read path is more complex and utilizes a bunch of data structures (both in memory and on disk) to optimize reads and reduce disk seeks. Cassandra has to combine the data in memtables along with data on disk (potentially multiple SSTables) before returning the data.
Cassandra Interactions on the Read pathTo satisfy a read, Cassandra must combine results from the active memtable and potentially multiple SSTables. Cassandra processes data at several stages on the read path to discover where the data is stored, starting with the data in the memtable and finishing with SSTables.
Key Caching : For frequently accessed data, Key cache helps in reducing its seeks in the SSTable.
The file is compressed in chunks. By default 64k of data is compressed, then next 64k etc. The offsets written in index file are that of the uncompressed data. This is because as its writing, it knows how many bytes have been written so far so uses that to mark whenever starting new partition. The compression offsets maps the compressed offsets and their uncompressed positions so it knows which chunk to start decompressing to get to the partition at some uncompressed offset from the index.
If a partition exists in the middle of a 64k compressed chunk, you need do decompress that entire chunk. You cannot start reading in the middle of it due to how the compression algorithms work. This is why in some situations it makes sense to decrease the chunk size as it would reduce the overhead of reading a tiny partition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With