Discrepancy between Cassandra trace and client-side latency

Question

We're on Cassandra 2.0.15, and seeing huge read latencies (>60sec) coming up at regular intervals (about every 3min), from all app hosts. We measure this latency around calls to session.execute(stmt). At the same time, Cassandra traces report duration of <1s. We also ran, in a loop, a query via cqlsh from the same hosts during those peak latency times, and cqlsh always came back within 1s. What can explain this discrepancy at the Java driver level?

-- edit: in reply to comments --

Cassandra servers JVM settings: -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -Xms32G -Xmx32G -XX:+UseG1GC -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC.

Client side GC is negligible (below). Client settings: -Xss256k -Xms4G -Xmx4G, Cassandra driver version is 2.1.7.1

Client side GC is negligible

Client side measuring code:

val selectServiceNames = session.prepare(QueryBuilder.select("service_name").from("service_names"))

override def run(): Unit = {
  val start = System.currentTimeMillis()
  try {
    val resultSet = session.execute(selectServiceNames.bind())
    val serviceNames = resultSet.all()
    val elapsed = System.currentTimeMillis() - start
    latency.add(elapsed) // emits metric to statsd
    if (elapsed > 10000) {
      log.info("Canary2 sensed high Cassandra latency: " + elapsed + "ms")
    }
  } catch {
    case e: Throwable =>
      log.error(e, "Canary2 select failed")
  } finally {
    Thread.sleep(100)
    schedule()
  }
}

Cluster construction code:

def createClusterBuilder(): Cluster.Builder = {
  val builder = Cluster.builder()
  val contactPoints = parseContactPoints()
  val defaultPort = findConnectPort(contactPoints)
  builder.addContactPointsWithPorts(contactPoints)
  builder.withPort(defaultPort) // This ends up config.protocolOptions.port
  if (cassandraUsername.isDefined && cassandraPassword.isDefined)
    builder.withCredentials(cassandraUsername(), cassandraPassword())
  builder.withRetryPolicy(ZipkinRetryPolicy.INSTANCE)
  builder.withLoadBalancingPolicy(new TokenAwarePolicy(new LatencyAwarePolicy.Builder(new RoundRobinPolicy()).build()))
}

One more observation I cannot explain. I ran two threads that execute the same query in the same manner (as above) in a loop, the only difference is yellow thread sleeps 100millisec between queries, and green thread sleeps 60sec between queries. Green thread hits low latency (under 1s) much more often than the yellow one.

enter image description here

Peter Lawrey · Accepted Answer

This is a common problem when you get a compoent to test itself.

you can experience delays which are not visible to the tools in question.
your component has no idea when the request should have started.
when the JVM stops, this can prevent you from seeing the delays you are try to measure.

The most likely explanation is the second one. Say you have a queue of 100 tasks but because the system is running slowly each task is taking 1 second. You time each task internally and it sees it took 1 seconds, however add 100 tasks to the queue and the first one starts after 0 second, but the last starts after 99 seconds and then reports it took 1 second, but from your point of view it took 100 seconds to complete, 99 seconds of which was waiting to start.

There can also be delays in the result reaching you but this is less likely unless the operations you do in processing the results is more than the database takes. i.e. you might assume the bottleneck is on the server.

Yuri Shkuro · Answer

I tracked the issue down to queries timing out on the nodes from remote data center. The cluster has nodes in two DCs, but the keyspace is only replicated within the local DC, so it is surprising that remove nodes were even considered. I was able to bring the latency down by

changing from ONE to LOCAL_ONE consistency and
changing from plain round-robin load balancer to DC-aware one (also using latency-aware and token-aware).

It still feels to me like a bug in the Java driver that it tries to use nodes from remote data center as coordination nodes when the keyspace is clearly non-existent in that data center. Also, even if that wasn't possible somehow, I was also using latency-aware policy, which should've excluded remote DC nodes from consideration.

Discrepancy between Cassandra trace and client-side latency

Tags:

java

cassandra

cql

driver

cqlsh

Yuri Shkuro

2 Answers

Peter Lawrey

Yuri Shkuro

Recent Activity

Donate For Us

Discrepancy between Cassandra trace and client-side latency

Tags:

java

cassandra

cql

driver

cqlsh

Yuri Shkuro

2 Answers

Peter Lawrey

Yuri Shkuro

Related questions

Recent Activity

Donate For Us