Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra ReadTimeout when querying existing data

For my test server, I have no-replication Cassandra 2.1.6 setup:

CREATE KEYSPACE v2 WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false;

CREATE TABLE v2.tiles (
    zoom int,
    idx int,
    tile blob,
    PRIMARY KEY (zoom, idx)
)

For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I allways see this time out error for specific storage cases:

cqlsh:v2> select zoom,idx from tiles where zoom=11 limit 10;
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

I get the same error for "zoom=11 and idx > 1000". For idx value closer to the existing items, it gives the right result:

cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 100000 limit 10;
 zoom | idx
------+--------
   11 | 100352
...

It also shows correct empty results when idx is compared with extremelly high value:

cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 1000000 limit 10;                                       
 zoom | idx | tile
------+-----+------
(0 rows)
like image 895
Yuri Astrakhan Avatar asked Feb 10 '23 10:02

Yuri Astrakhan


1 Answers

For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I always see this time out error for specific storage cases.

This sounds like a wide row row issue. When you have many items for a single partition (zoom in your case) it can create problems for reads in cassandra. In general it's a good rule of thumb to keep partitions at < 100MB in size, do you think you may have partitions that large? On average how many bytes is the 'tile' column? For example, with idx being a 4-byte int, and lets assume a blob size of 96 bytes, giving 100 bytes per row and ignoring any overhead ~1,048,576 rows would equal 100MB

Although your page size in small, there is still is quite a bit of overhead on cassandra's end to read the data and its indexes on disk. What seems to happening is your C* node is not able to read the data within read_request_timeout_in_ms (default is 10s). When your queries do work about how long are they taking?

It may be worth enabling tracing ('TRACING ON' in a cqlsh session) to help understand what is taking so long when your queries do succeed. You could also consider increasing read_request_timeout_in_ms to some arbitrarily large value while debugging. A good article on tracing can be found here.

If you find that your rows are too wide, you may consider partitioning your data further, for example by day:

CREATE TABLE v2.tiles (
    zoom int,
    day timestamp,
    idx int,
    tile blob,
    PRIMARY KEY ((zoom, day), idx)
)

Although without knowing more about your data model, time might not be a good way of partitioning.

like image 138
Andy Tolbert Avatar answered Feb 23 '23 13:02

Andy Tolbert