Through ops-center and nodetool cfstats i was able to find that one of the partitions of a keyspace table is 560 Mb, but couldn't find out which partition is that. How can we trace which partition of the table is that big ??
Try nodetool tablehistograms -- <keyspace> <table> command provides statistics about a table, including read/write latency, partition size, column count, and number of SSTables. This provides proper stats of the table like 95% percentile of raw_data table has partition size of 107MB and max of 3.44GB.
Partition in Cassandra represent grouping of similar kind of rows. In Cassandra it is recommended to model your data such that you should have similar kind of rows fall in same partition. This is called wide partition pattern. Searching in Cassandra is super fast using partition key.
As we learned earlier, Cassandra uses a consistent hashing technique to generate the hash value of the partition key (app_name) and assign the row data to a partition range inside a node.
Cassandra allows you to use multiple columns as the partition key for a table with a composite partition key. Unlike a simple partition key, a composite partition key is used when the data stored is too large to reside in a single partition and determines where data will reside with multiple columns.
The fastest possible way is to look for messages in the log about compacting large partitions. Sort of a cheat, but it often works.
Short of that, you'll need to dump the sstables to json, and then inspect the json. There are a number of people who have written tools for this online - https://github.com/BrianGallew/cassandra_tools is one example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With