I use python script, that passes to cassandra batch query, like this:
query = 'BEGIN BATCH ' + 'insert into ... ; insert into ... ; insert into ...; ' + ' APPLY BATCH;'
session.execute(query)
It is work some time, but in about 2 minutes after start scripts fails and print:
Traceback (most recent call last):<br>
File "/home/fervid/Desktop/cassandra/scripts/parse_and_save_to_cassandra.cgi", line 127, in <module><br>
session.execute(query)<br>
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1103, in execute<br>
result = future.result(timeout)<br>
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2475, in result<br>
raise OperationTimedOut(errors=self._errors, last_host=self._current_host)<br>
cassandra.OperationTimedOut: errors={}, last_host=127.0.0.1<br>
<br>
<br>
I changed timeout from cassandra.yaml
to:
read_request_timeout_in_ms: 15000
range_request_timeout_in_ms: 20000
write_request_timeout_in_ms: 20000
cas_contention_timeout_in_ms: 10000
request_timeout_in_ms: 25000
Then I restarted cassandra, but it didn't help. The error happens again and again!
Lines in log near the time when script failed:
INFO [BatchlogTasks:1] 2014-06-11 14:18:10,490 ColumnFamilyStore.java (line 794) Enqueuing flush of Memtable-batchlog@28149592(13557969/13557969 serialized/live bytes, 4 ops)
INFO [FlushWriter:10] 2014-06-11 14:18:10,490 Memtable.java (line 363) Writing Memtable-batchlog@28149592(13557969/13557969 serialized/live bytes, 4 ops)
INFO [FlushWriter:10] 2014-06-11 14:18:10,566 Memtable.java (line 410) Completed flushing; nothing needed to be retained. Commitlog position was ReplayPosition(segmentId=1402469922169, position=27138996)
INFO [ScheduledTasks:1] 2014-06-11 14:18:13,758 GCInspector.java (line 116) GC for ParNew: 640 ms for 3 collections, 775214160 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:16,155 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 1838 ms for 2 collections, 810976000 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 1612 ms for 1 collections, 858404088 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 StatusLogger.java (line 55) Pool Name Active
Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 StatusLogger.java (line 70) ReadStage 0 0 627 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) RequestResponseStage 0
0 0 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) ReadRepairStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) MutationStage 0
0 184386 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) ReplicateOnWriteStage 0 0 0 0 0
This is a client side timeout (see the link in @Syrial's reply: http://datastax.github.io/python-driver/api/cassandra.html#cassandra.OperationTimedOut)
You can change the Session
default timeout:
session = cluster.connect()
session.default_timeout = 30 # this is in *seconds*
You can change the timeout for a particular query:
session.execute(statement, 30, ...)
You could speed up the execution quite a bit by using prepared statements inside the BATCH
. See the Batching statements sections in this post
If you are after even better results, please read these performance notes
According to the docs, this error is saying that the operation took longer than specified on the client-side. The error is generated by the driver, not Cassandra. I'm still searching for a way to handle this error myself.
http://datastax.github.io/python-driver/api/cassandra.html#cassandra.OperationTimedOut
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With