Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search key of big partition in cassandra

Tags:

cassandra

nodetool cfstats/tablestats shows the "Compacted partition maximum bytes"

Now how to find the key of this partition or other huge partitions ?

The purpose is to analyse why these partitions are growing big and correct the data model accordingly.

I have seen it's possible to see these partition keys in logs, but unfortunately my logs are periodically removed.

like image 281
crak Avatar asked Nov 04 '16 11:11

crak


People also ask

How do I find a large partition in Cassandra?

Try nodetool tablehistograms -- <keyspace> <table> command provides statistics about a table, including read/write latency, partition size, column count, and number of SSTables. This provides proper stats of the table like 95% percentile of raw_data table has partition size of 107MB and max of 3.44GB.

What are partition keys in Cassandra?

A partition key can have a partition key defined with multiple table columns which determines which node stores the data. For a table with a composite partition key, Cassandra uses multiple columns as the partition key. These columns form logical sets inside a partition to facilitate retrieval.

Does Cassandra Sort by partition key?

In this article, we learned that Cassandra uses a partition key or a composite partition key to determine the placement of the data in a cluster. The clustering key provides the sort order of the data stored within a partition. All of these keys also uniquely identify the data.

Is partition key unique in Cassandra?

The partition key has a special use in Apache Cassandra beyond showing the uniqueness of the record in the database.. Please note that there will not be any error if you insert same partition key again and again as there is no constraint check.


3 Answers

You might look at nodetool toppartitions command which is supposed to show you the most active partitions. Sometimes it helps to analyze and manage your data.

like image 70
kikulikov Avatar answered Oct 08 '22 12:10

kikulikov


You can use the instaclustr tools

https://www.instaclustr.com/support/documentation/tools/ic-tools-for-cassandra-sstables/

The following command is useful for finding big partitions:

ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-n <num>    Number of partitions to display in leaders lists
-t <name>   Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>  Comma separated list of Data.db sstables to filter on

Another useful tool is sstable-tools:

https://github.com/tolbertam/sstable-tools

It has a describe command that show the widest and largest partitions

java -jar sstable-tools.jar describe ma-2-big-Data.db

The output is like this:

/Users/clohfink/git/sstable-tools/./src/test/resources/ma-2-big-Data.db
=======================================================================
Partitions: 1
Rows: 1
Tombstones: 0
Cells: 4
Widest Partitions:
   [frodo] 1
Largest Partitions:
   [frodo] 104 (104 B)
Tombstone Leaders:
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Size: 50 (50 B)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
  Compression ratio: -1.0
Minimum timestamp: 1455937221199050 (02/19/2016 21:00:21)
Maximum timestamp: 1455937221199050 (02/19/2016 21:00:21)
SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07)
SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07)
TTL min: 0 (0 milliseconds)
like image 31
Amir Hossein Javan Avatar answered Oct 08 '22 11:10

Amir Hossein Javan


Maybe you can use an external tool like Apache drill or presto-db to run a query like :

SELECT key1, key2, COUNT(*) AS total
FROM yourTable
GROUP BY key1, key2
ORDER BY total DESC
LIMIT 10;

Where key1 and key2 are part of your partition key.

This query will get the top 10 partitions by size.

Hope this can help you.

like image 32
Guillaume S Avatar answered Oct 08 '22 10:10

Guillaume S