Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internals of Partition Summary in cassandra

I was watching: https://academy.datastax.com/courses/learning-cassandra-read-path/understanding-partition-summaries-and-indexes and I have a question regarding this presentation.

What actually Partition Summary represents? :)

My first idea was it is just a cache that keeps x% of keys locations. Which would imply that approximately one request of 126 can get a key directly and other 125 must travel whole table. But this is pretty ineffective I think.

My second idea was that Partition Summary is somehow able for a specified key to give you a range of indexes where a row for a given key should exist. But I can't imagine how this could be implemented? Especially if this table should be of size |Partition Index| / index_interval

Another question that comes to my mind can SSTable keep many entries for a specific key?

Thanks, krzychusan

like image 926
user617768 Avatar asked Dec 02 '25 21:12

user617768


1 Answers

The partition summary is a sampling of the partition index. The partition summary is an in-memory structure that reduces the amount of scan time required to find a partition key within the index.

A very simple example will help explain the concept.

Assume the partition index file has 100 partition keys in it: pk001 to pk100. The partition keys are stored in sorted order, so we know that pk027 comes after pk025.

In this over simplified example, if the partition summary were set to sample every 10 partition keys, then it would contain a map to ten partition keys and their location on disk within the partition index. For example, pk001 -> beginning of file, pk010 -> location of pk010 in the index file, and so on.

Now, when C* gets a request for pk027, it knows that pk027 is located after pk020. Also, the summary (which samples every 10 partition keys), knows the exact location of pk020.

So, C* does a seek to the location of pk020 within the index file based on the information provided by the summary. It then performs a very short scan from pk020 to pk027.

In summary, the partition summary is an in-memory sampling of the partition index file that allows Cassandra to perform a seek the approximate location of a partition within the index file followed by a very short scan.

like image 69
Akbar Ahmed Avatar answered Dec 04 '25 15:12

Akbar Ahmed