Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select count(*) runs into timeout issues in Cassandra

Tags:

Maybe it is a stupid question, but I'm not able to determine the size of a table in Cassandra.

This is what I tried:

select count(*) from articles;

It works fine if the table is small but once it fills up, I always run into timeout issues:

cqlsh:

  • OperationTimedOut: errors={}, last_host=127.0.0.1

DBeaver:

  • Run 1: 225,000 (7477 ms)
  • Run 2: 233,637 (8265 ms)
  • Run 3: 216,595 (7269 ms)

I assume that it hits some timeout and just aborts. The actual number of entries in the table is probably much higher.

I'm testing against a local Cassandra instance which is completely idle. I would not mind if it has to do a full table scan and is unresponsive during that time.

Is there a way to reliably count the number of entries in a Cassandra table?

I'm using Cassandra 2.1.13.

like image 335
Philipp Claßen Avatar asked Apr 20 '16 12:04

Philipp Claßen


2 Answers

As far as I see you problem connected to timeout of cqlsh: OperationTimedOut: errors={}, last_host=127.0.0.1

you can simple increase it with options:

 --connect-timeout=CONNECT_TIMEOUT
                       Specify the connection timeout in seconds (default: 5
                       seconds).
 --request-timeout=REQUEST_TIMEOUT
                       Specify the default request timeout in seconds
                       (default: 10 seconds).
like image 114
Oleksandr Petrenko Avatar answered Sep 19 '22 02:09

Oleksandr Petrenko


Here is my current workaround:

COPY articles TO '/dev/null';
...
3568068 rows exported to 1 files in 2 minutes and 16.606 seconds.

Background: Cassandra supports to export a table to a text file, for instance:

COPY articles TO '/tmp/data.csv';
Output: 3568068 rows exported to 1 files in 2 minutes and 25.559 seconds

That also matches the number of lines in the generated file:

$ wc -l /tmp/data.csv
3568068
like image 30
Philipp Claßen Avatar answered Sep 18 '22 02:09

Philipp Claßen