Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cassandra-stress "Failed to connect over JMX; not collecting these stats"

I’m trying to use the cassandra-stress tool for the first time today. Although I'm able to run the tool, a lot of "Failed to connect over JMX; not collecting these stats" messages are displayed in the output

Command

cassandra-stress user \
    profile=./stress_write.yaml ops\(insert=1\) \
    n=1000000 \
    -log file=./stress_write.log \
    -node node1,node2,node3,node4,node5,node6

Output

WARN  19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node5) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node1) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node2) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node4) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node3) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node5) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node1) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node2) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node4) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN  19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node3) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
INFO  19:44:26 Using data-center name 'DC2' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
INFO  19:44:26 New Cassandra host /node2:9042 added
INFO  19:44:26 New Cassandra host /node5:9042 added
Connected to cluster: MyCluster
INFO  19:44:26 New Cassandra host /node4:9042 added
INFO  19:44:26 New Cassandra host /node1:9042 added
INFO  19:44:26 New Cassandra host /node6:9042 added
Datatacenter: DC2; Host: /node4; Rack: rack1
Datatacenter: DC2; Host: /node3; Rack: rack1
Datatacenter: DC2; Host: /node6; Rack: rack1
Datatacenter: DC2; Host: /node5; Rack: rack1
Datatacenter: DC2; Host: /node1; Rack: rack1
Datatacenter: DC2; Host: /node2; Rack: rack1
INFO  19:44:26 New Cassandra host /node3:9042 added
Created schema. Sleeping 6s for propagation.
Failed to connect over JMX; not collecting these stats
Generating batches with [1..1] partitions and [1..1] rows (of [1..1] total rows in the partitions)
Failed to connect over JMX; not collecting these stats
Failed to connect over JMX; not collecting these stats
Improvement over 4 threadCount: 36%
Failed to connect over JMX; not collecting these stats
Improvement over 8 threadCount: 138%
Failed to connect over JMX; not collecting these stats
Improvement over 16 threadCount: 48%
Failed to connect over JMX; not collecting these stats
Improvement over 24 threadCount: 33%
Failed to connect over JMX; not collecting these stats
Improvement over 36 threadCount: 27%
Failed to connect over JMX; not collecting these stats
Improvement over 54 threadCount: 39%
Failed to connect over JMX; not collecting these stats
Improvement over 81 threadCount: 37%
Failed to connect over JMX; not collecting these stats
Improvement over 121 threadCount: 16%
Failed to connect over JMX; not collecting these stats
Improvement over 181 threadCount: 1%
Failed to connect over JMX; not collecting these stats
Improvement over 271 threadCount: 15%
Failed to connect over JMX; not collecting these stats
Improvement over 406 threadCount: 3%
Failed to connect over JMX; not collecting these stats
Improvement over 609 threadCount: -3%

Is there any command-line or file-based configuration parameter that I need to specify for JMX? I have tested and confirmed that connectivity between the stress machine and my nodes is not the issue, because I was able to establish a connection between them via jmxsh.

Another issue with the output, which may or may not be related to the JMX error, is that is it missing some key parts. I'm quoting the sample output from this Datastax documentation page to show the parts that are missing from what I got:

WARNING: uncertainty mode (err<) results in uneven workload between thread runs, so should be used for high level analysis only
Running with 4 threadCount
Running WRITE with 4 threads until stderr of mean < 0.02
total ops , adj row/s,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,      mb
2552      ,      2553,    2553,    2553,    2553,     1.5,     1.4,     2.5,     6.0,    12.6,    18.0,    1.0,  0.00000,      0,       0,       0,       0,       0
5173      ,      2634,    2613,    2613,    2613,     1.5,     1.5,     1.8,     2.6,     8.6,     9.2,    2.0,  0.00000,      0,       0,       0,       0,       0
...

Results:
op rate                   : 3954
partition rate            : 3954
row rate                  : 3954
latency mean              : 1.0
latency median            : 0.8
latency 95th percentile   : 1.5
latency 99th percentile   : 1.8
latency 99.9th percentile : 2.2
latency max               : 73.6
total gc count            : 25
total gc mb               : 1826
total gc time (s)         : 1
avg gc time(ms)           : 37
stdev gc time(ms)         : 10
Total operation time      : 00:00:59
Sleeping for 15s
Running with 4 threadCount

Notes

  • My cluster is running DSE 4.6.1 (Cassandra 2.0.12)
  • I'm running the stress tool from a different machine
  • The stress tool version is from DSC 2.1 (Cassandra 2.1)
like image 225
PJ. Avatar asked Mar 29 '15 13:03

PJ.


2 Answers

I have the same setup (Cassandra version is 2.0.12) and the stress tool is from 2.1 and saw similar issues. Finally I had some time to investigate.

I downloaded the source code and ran it in the debugger. What I saw is that this error message is misleading. The tool connects to JMX but has problem with one of the mBeans (org.apache.cassandra.service:type=GCInspector).

I saw the same exeption when I ran the stress test with the option: -log level=verbose and saw the following Exception:

java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy11.getAndResetStats(Unknown Source)
        at org.apache.cassandra.tools.NodeProbe.getAndResetGCStats(NodeProbe.java:385)
        at org.apache.cassandra.stress.util.JmxCollector.<init>(JmxCollector.java:86)
        at org.apache.cassandra.stress.StressMetrics.<init>(StressMetrics.java:64)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:187)
        at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:97)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:61)
        at org.apache.cassandra.stress.Stress.main(Stress.java:109)
Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.service:type=GCInspector 
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(Unknown Source)
        at ....

I connected to Cassandra using jConsole and version 2.0.12 does not have this mBean.

But my output has most of the data cited in the sample (except of the garbage collection statistics).

Have you tried running cassandra-stress with default configuration? Also try setting verbose for logging, may be it will give you some ideas.

like image 122
jny Avatar answered Sep 21 '22 16:09

jny


I was also facing the same issue(Cassandra 3.7), I ran my Cassandra-stress client with -log level=verbose and saw below exception : java.lang.RuntimeException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exce4; nested exception is: java.net.ConnectException: Connection timed out] at org.apache.cassandra.stress.util.JmxCollector.connect(JmxCollector.java:99) at org.apache.cassandra.stress.util.JmxCollector.(JmxCollector.java:85) at org.apache.cassandra.stress.StressMetrics.(StressMetrics.java:62) at org.apache.cassandra.stress.StressAction.run(StressAction.java:211) at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:107) at org.apache.cassandra.stress.StressAction.run(StressAction.java:60) at org.apache.cassandra.stress.Stress.run(Stress.java:133) at org.apache.cassandra.stress.Stress.main(Stress.java:61) Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmion is: java.net.ConnectException: Connection timed out] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:188) at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:155) at org.apache.cassandra.stress.util.JmxCollector.connect(JmxCollector.java:95) ... 7 more Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 1.2.3.4; java.net.ConnectException: Connection timed out] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:122) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) at javax.naming.InitialContext.lookup(InitialContext.java:417) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1957) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1924) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287) ... 11 more Caused by: java.rmi.ConnectException: Connection refused to host: 1.2.3.4; nested exception is: java.net.ConnectException: Connection timed out at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:342) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118) ... 16 more

So to resolve this issue, I have set my rpc_address property from Cassandra.yaml file to <host_ip> and commented the broadcast_rpc_address property.

This works for me and I am not getting that error anymore.

like image 45
Shrikant Salgar Avatar answered Sep 23 '22 16:09

Shrikant Salgar