Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra operation timed out

Operation timed out - received only 0 responses.', info: 'Represents an error message from the server', code: 4608, consistencies: 1, received: 0, blockFor: 1, isDataPresent: 0, ...

I get this error a few times a day trying to do SELECT queries on my cassandra cluster. We have a 3 node cluster on m1.large aws instances. They succeed most of the time, but every once in a while we get the above error. We are not in production yet so all tables are small. We dont have any tables over a few thousand rows and the same queries complete fine other times. Raising the time out time is not an option and I don't believe it will solve the problem (the queries should be short and the query in the error is not the same each time)

Could this be some connection going stale between nodes or network issue? What's the best way to test for these? I also only see this error on my client side, is there somewhere I should be seeing this in the cassandra logs?

like image 642
Alex Yurkowski Avatar asked Jul 06 '16 18:07

Alex Yurkowski


1 Answers

This is actually an error coming back from the C* server that is responsible for handling your request (aka 'the coordinator').

It looks like you are querying with a consistency level of 'ONE' so only 1 replica holding the data needs to respond to the coordinator within the configured read_request_timeout_in_ms in your cassandra.yaml file on the server (default is 5 seconds), but no replicas responded within that time period.

Timeouts can happen and your application should be prepared handle them based on your preferences (either flat out fail, retry, increase replication factor to make it less likely, etc.)

Here are a few things you should consider:

  1. Increase the replication factor of the keyspace you are querying data from. If your replication factor is 1, you are dependent on 1 node to be available to respond to queries for a particular partition. Increasing your RF to something like 3 will make your application more resilient to poorly-performing nodes or nodes going down.
  2. Configure your RetryPolicy to retry reads depending on how you'd like it to behave. The default with the nodejs-driver is to only retry reads once and only if received > blockFor (which in your case it wasn't).
  3. Increase read_request_timeout_in_ms in your cassandra.yaml. I would discourage this though, 5000ms should be more than enough unless you have a poor configuration / environment / queries.
like image 174
Andy Tolbert Avatar answered Sep 24 '22 17:09

Andy Tolbert