I am pulling big amount of data from cassandra 2.0, but unfortunately getting timeout exception. My table:
CREATE KEYSPACE StatisticsKeyspace
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
CREATE TABLE StatisticsKeyspace.HourlyStatistics(
KeywordId text,
Date timestamp,
HourOfDay int,
Impressions int,
Clicks int,
AveragePosition double,
ConversionRate double,
AOV double,
AverageCPC double,
Cost double,
Bid double,
PRIMARY KEY(KeywordId, Date, HourOfDay)
);
CREATE INDEX ON StatisticsKeyspace.HourlyStatistics(Date);
My query:
SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid
FROM StatisticsKeyspace.hourlystatistics
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24'
I've changed configurations in my cassandra.yaml
file.
read_request_timeout_in_ms: 60000
range_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 40000
cas_contention_timeout_in_ms: 3000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 60000
But it still throws timeout approximately in 10 seconds. Any ideas how can I fix this problem?
If using the java client from datastax, pagination is enabled by default with a row set of 5000. If you still get a timeout, you may try to reduce this using
public Statement setFetchSize(int fetchSize)
(read more)
If you are using the cli, you may need to experiment with some kind of manual pagination:
SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid
FROM StatisticsKeyspace.hourlystatistics
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24'
LIMIT 100;
SELECT * FROM .... WHERE token(KeywordId) > token([Last KeywordId received]) AND ...
LIMIT 100;
To detect some cluster issues you can try a select with a limit of 1, maybe there is an underlying problem.
Hope that helps.
If you are still experiencing performance issues with your query, I would look at your secondary index, since the amount of data transferred seems to reasonable (only 'small' data types are returned). If I am right, changing the fetch size will not change much. Instead, do you insert dates only in your "Date" (timestamp) column? If you are inserting actual timestamps instead, the secondary index on this column will be very slow due to the cardinality. If you insert a date only, the timestamp will default to date + "00:00:00" + TZ which should reduce the cardinality and thus improve the look-up speed. (watch out for timezone issues!) To be absolutely sure, try a secondary index on a column with a different data type, like an int for Date (counting the days since 1970-01-01 or sth).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With