Intro
I'm trying to gather some stats from a Cassandra 1.2.6 cluster to implement a web service to provide those stats to a web app. I'm accessing the cluster from Python using the cql library, but I can ssh or pssh to the nodes as well.
The problem
My problem is how to get the total table size (i.e. the actual disk usage of each table) in the entire cluster, and if possible the total row count of each table (this can be an estimate).
The question
So far the only option I've found seems to be running nodetool cfstats on each node and parse the response, is there a better way of doing this?
Thanks in advance!
I think the best way to do this would be to access the statistics directly through JMX (which is how nodetool actually works.) Each node provdies a wide range of metrics but what you would be interested in are.
org.apache.cassandra.metrics
ColumnFamily
cf_name
TotalDiskSpaceUsed
MemtableDataSize
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With