Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Operation Time Out Error in cqlsh console of cassandra

I have a three nodes Cassandra Cluster and I have created one table which has more than 2,000,000 rows.

When I execute this (select count(*) from userdetails) query in cqlsh, I got this error:

OperationTimedOut: errors={}, last_host=192.168.1.2

When I run count function for less row or with limit 50,000 it works fine.

like image 641
Kaushal Avatar asked Apr 01 '15 15:04

Kaushal


People also ask

What is Cqlsh in Cassandra?

cqlsh is a command-line interface for interacting with Cassandra using CQL (the Cassandra Query Language). It is shipped with every Cassandra package, and can be found in the bin/ directory alongside the cassandra executable.

How do I run Cassandra from command prompt?

Step 4: Start Cassandra from Windows CMD Navigate to the Cassandra bin folder. Start the Windows Command Prompt directly from within the bin folder by typing cmd in the address bar and pressing Enter. The system proceeds to start the Cassandra Server.


1 Answers

count(*) actually pages through all the data. So a select count(*) from userdetails without a limit would be expected to timeout with that many rows. Some details here: http://planetcassandra.org/blog/counting-key-in-cassandra/

You may want to consider maintaining the count yourself, using Spark, or if you just want a ball park number you can grab it from JMX.

To grab from JMX it can be a little tricky depending on your data model. To get the number of partitions grab the org.apache.cassandra.metrics:type=ColumnFamily,keyspace={{Keyspace}},scope={{Table​}},name=EstimatedColumnCountHistogram mbean and sum up all the 90 values (this is what nodetool cfstats outputs). It will only give you the number that exist in sstables so to make it more accurate you can do a flush or try to estimate number in memtables from the MemtableColumnsCount mbean

For a very basic ballpark number you can grab the estimated partition counts from system.size_estimates across all the ranges listed (note that this is only number on one node). Multiply that out by number of nodes, then divided by RF.

like image 100
Chris Lohfink Avatar answered Sep 28 '22 06:09

Chris Lohfink