For my test server, I have no-replication Cassandra 2.1.6 setup:
CREATE KEYSPACE v2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false; CREATE TABLE v2.tiles ( zoom int, idx int, tile blob, PRIMARY KEY (zoom, idx) )
For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I allways see this time out error for specific storage cases:
cqlsh:v2> select zoom,idx from tiles where zoom=11 limit 10;
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
I get the same error for "zoom=11 and idx > 1000". For idx value closer to the existing items, it gives the right result:
cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 100000 limit 10;
zoom | idx
------+--------
11 | 100352
...
It also shows correct empty results when idx is compared with extremelly high value:
cqlsh:v2> select zoom,idx from tiles where zoom=11 and idx > 1000000 limit 10;
zoom | idx | tile
------+-----+------
(0 rows)
For each zoom value, there could be tens of millions of small items. For zoom=11, the first idx is in around 100352. When I need to iterate over all items, I always see this time out error for specific storage cases.
This sounds like a wide row row issue. When you have many items for a single partition (zoom in your case) it can create problems for reads in cassandra. In general it's a good rule of thumb to keep partitions at < 100MB in size, do you think you may have partitions that large? On average how many bytes is the 'tile' column? For example, with idx being a 4-byte int, and lets assume a blob size of 96 bytes, giving 100 bytes per row and ignoring any overhead ~1,048,576 rows would equal 100MB
Although your page size in small, there is still is quite a bit of overhead on cassandra's end to read the data and its indexes on disk. What seems to happening is your C* node is not able to read the data within read_request_timeout_in_ms (default is 10s). When your queries do work about how long are they taking?
It may be worth enabling tracing ('TRACING ON' in a cqlsh session) to help understand what is taking so long when your queries do succeed. You could also consider increasing read_request_timeout_in_ms to some arbitrarily large value while debugging. A good article on tracing can be found here.
If you find that your rows are too wide, you may consider partitioning your data further, for example by day:
CREATE TABLE v2.tiles (
zoom int,
day timestamp,
idx int,
tile blob,
PRIMARY KEY ((zoom, day), idx)
)
Although without knowing more about your data model, time might not be a good way of partitioning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With