Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain reliable insert times in Cassandra?

Tags:

c#

cassandra

I am currently benchmarking Cassandra with 3 nodes using CassandraSharp. My main concern is more latency than throughput, so after a bit of GC tuning here are my numbers (on 100 000K inserts, mono-thread):

  • Iter/sec: 1600
  • Average: 600µs
  • 95 cent: 600µs
  • 99 cent: 5000µs
  • Max: 50 000µs

My problem here is that once in a while I get a "bad" latency (50ms), my goal being to have consistent latency, even at the cost of a higher average.

I believe that this is caused by the GC, and I'm wondering if it could be avoided.

(As a side note, is it a good practice to send a big amount of inserts to one node and let it handle it or should I "load balance" it in the client?)

like image 859
alprema Avatar asked Oct 24 '22 05:10

alprema


2 Answers

50ms is within the normal range for a young-generation garbage collection. You can enable GC logging in cassandra-env.sh by uncommenting the appropriate lines towards the bottom to verify that this is the problem.

(Flushes do not block inserts unless your disk is so slow it can't keep up with insert volume, which is unusual since flushes are sequential i/o.)

If young generation collections are indeed correlated with the higher latencies, you can reduce try making the young generation smaller (also configured in cassandra-env.sh), at the potential cost of trading latency for throughput.

like image 112
jbellis Avatar answered Nov 01 '22 12:11

jbellis


I don't think that you'll be able to get away from the bad latency issue once-in-a-while. It's most likely to be either the GC that you mention, or when it's performing a flush to disk from the Memtables.

Is the bad insert of 50ms really a problem? Cassandra supports batch mutators that allow you to queue your insert operations up in one long mutator and then perform the batch of inserts at a later time so that your main thread doesn't need to by get blocked by the synchronous insert that may take longer than expected. I haven't used CassandarSharp so don't know whether it exposes this functionality.

Also, load-balancing across the cassandra nodes will slightly improve your import times, but remember that what's going on behind the scenes is that the node you provide the import into will hand it off to the correct node to do the storage (so the node you give it to acts as a proxy really) so I wouldn't imagine much improvement in the general edge case. It will help you if for some reason that node starts doing other things and its performance suffers.

like image 32
agentgonzo Avatar answered Nov 01 '22 13:11

agentgonzo