nodetool cfstats/tablestats shows the "Compacted partition maximum bytes" Now how to find the key of this partition or other huge partitions ? The purpose is to analyse why these partitions are growing big and correct the data model accordingly. I have seen it's possible to see these partition keys in logs, but unfortunately my logs are periodically removed.

You can use the instaclustr tools https://www.instaclustr.com/support/documentation/tools/ic-tools-for-cassandra-sstables/ The following command is useful for finding big partitions: <code>ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family></code> <pre class="prettyprint"><code>-n <num> Number of partitions to display in leaders lists -t <name> Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified. -f <files> Comma separated list of Data.db sstables to filter on </code></pre> Another useful tool is sstable-tools: https://github.com/tolbertam/sstable-tools It has a describe command that show the widest and largest partitions <pre class="prettyprint"><code>java -jar sstable-tools.jar describe ma-2-big-Data.db </code></pre> The output is like this: <pre class="prettyprint"><code>/Users/clohfink/git/sstable-tools/./src/test/resources/ma-2-big-Data.db ======================================================================= Partitions: 1 Rows: 1 Tombstones: 0 Cells: 4 Widest Partitions: [frodo] 1 Largest Partitions: [frodo] 104 (104 B) Tombstone Leaders: Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.010000 Size: 50 (50 B) Compressor: org.apache.cassandra.io.compress.LZ4Compressor Compression ratio: -1.0 Minimum timestamp: 1455937221199050 (02/19/2016 21:00:21) Maximum timestamp: 1455937221199050 (02/19/2016 21:00:21) SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07) SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07) TTL min: 0 (0 milliseconds) </code></pre>

Search key of big partition in cassandra

3 Answers

You might look at nodetool toppartitions command which is supposed to show you the most active partitions. Sometimes it helps to analyze and manage your data.

answered Oct 08 '22 12:10

kikulikov

You can use the instaclustr tools

https://www.instaclustr.com/support/documentation/tools/ic-tools-for-cassandra-sstables/

The following command is useful for finding big partitions:

ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-n <num>    Number of partitions to display in leaders lists
-t <name>   Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>  Comma separated list of Data.db sstables to filter on

Another useful tool is sstable-tools:

https://github.com/tolbertam/sstable-tools

It has a describe command that show the widest and largest partitions

java -jar sstable-tools.jar describe ma-2-big-Data.db

The output is like this:

/Users/clohfink/git/sstable-tools/./src/test/resources/ma-2-big-Data.db
=======================================================================
Partitions: 1
Rows: 1
Tombstones: 0
Cells: 4
Widest Partitions:
   [frodo] 1
Largest Partitions:
   [frodo] 104 (104 B)
Tombstone Leaders:
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Size: 50 (50 B)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
  Compression ratio: -1.0
Minimum timestamp: 1455937221199050 (02/19/2016 21:00:21)
Maximum timestamp: 1455937221199050 (02/19/2016 21:00:21)
SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07)
SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07)
TTL min: 0 (0 milliseconds)

answered Oct 08 '22 11:10

Amir Hossein Javan

Maybe you can use an external tool like Apache drill or presto-db to run a query like :

SELECT key1, key2, COUNT(*) AS total
FROM yourTable
GROUP BY key1, key2
ORDER BY total DESC
LIMIT 10;

Where key1 and key2 are part of your partition key.

This query will get the top 10 partitions by size.

Hope this can help you.

answered Oct 08 '22 10:10

Guillaume S

Related questions
                            
                                UnavailableException() in Apache-Cassandra 0.8.2
                            
                                Mongodb vs Cassandra for aggregating, searching and analyzing many logs
                            
                                Cassandra CQL - Update command to not create new row
                            
                                What does these Cassandra warnings mean: Unable to lock JVM memory and MemoryMeter uninitialized
                            
                                how to load schema file into Cassandra with cqlsh
                            
                                Tell Datastax Java Cassandra driver to timeout cluster connection
                            
                                Stress Test for Cassandra
                            
                                Cassandra on Windows: Fatal configuration error
                            
                                I'm Unable to connect using python Cassandra-Driver
                            
                                Storing schema less data in cassandra
                            
                                Adding an existing non-seed Cassandra node to the list of seeds
                            
                                Cassandra - advantages of custom type
                            
                                Cassandra storage internal
                            
                                Spark Cassandra connector filtering with IN clause
                            
                                Cassandra non counter family
                            
                                Performance difference between synchronous SELECT + INSERT vs INSERT ... IF NOT EXISTS in CQL?
                            
                                Using Datastax Java Driver to query a row as a JSON
                            
                                Cassandra DB. com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table person
                            
                                jemalloc shared library could not be preloaded to speed up memory allocations
                            
                                map<text, object> Cassandra, is it possible

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Search key of big partition in cassandra

Tags:

cassandra

crak

People also ask

3 Answers

kikulikov

Amir Hossein Javan

Guillaume S

Recent Activity

Donate For Us