I’m trying to use the cassandra-stress tool for the first time today. Although I'm able to run the tool, a lot of "Failed to connect over JMX; not collecting these stats" messages are displayed in the output
Command
cassandra-stress user \
profile=./stress_write.yaml ops\(insert=1\) \
n=1000000 \
-log file=./stress_write.log \
-node node1,node2,node3,node4,node5,node6
Output
WARN 19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node5) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node1) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node2) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node4) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:25 Found host with 0.0.0.0 as rpc_address, using listen_address (/node3) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node5) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node1) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node2) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node4) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
WARN 19:44:26 Found host with 0.0.0.0 as rpc_address, using listen_address (/node3) to contact it instead. If this is incorrect you should avoid the use of 0.0.0.0 server side.
INFO 19:44:26 Using data-center name 'DC2' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
INFO 19:44:26 New Cassandra host /node2:9042 added
INFO 19:44:26 New Cassandra host /node5:9042 added
Connected to cluster: MyCluster
INFO 19:44:26 New Cassandra host /node4:9042 added
INFO 19:44:26 New Cassandra host /node1:9042 added
INFO 19:44:26 New Cassandra host /node6:9042 added
Datatacenter: DC2; Host: /node4; Rack: rack1
Datatacenter: DC2; Host: /node3; Rack: rack1
Datatacenter: DC2; Host: /node6; Rack: rack1
Datatacenter: DC2; Host: /node5; Rack: rack1
Datatacenter: DC2; Host: /node1; Rack: rack1
Datatacenter: DC2; Host: /node2; Rack: rack1
INFO 19:44:26 New Cassandra host /node3:9042 added
Created schema. Sleeping 6s for propagation.
Failed to connect over JMX; not collecting these stats
Generating batches with [1..1] partitions and [1..1] rows (of [1..1] total rows in the partitions)
Failed to connect over JMX; not collecting these stats
Failed to connect over JMX; not collecting these stats
Improvement over 4 threadCount: 36%
Failed to connect over JMX; not collecting these stats
Improvement over 8 threadCount: 138%
Failed to connect over JMX; not collecting these stats
Improvement over 16 threadCount: 48%
Failed to connect over JMX; not collecting these stats
Improvement over 24 threadCount: 33%
Failed to connect over JMX; not collecting these stats
Improvement over 36 threadCount: 27%
Failed to connect over JMX; not collecting these stats
Improvement over 54 threadCount: 39%
Failed to connect over JMX; not collecting these stats
Improvement over 81 threadCount: 37%
Failed to connect over JMX; not collecting these stats
Improvement over 121 threadCount: 16%
Failed to connect over JMX; not collecting these stats
Improvement over 181 threadCount: 1%
Failed to connect over JMX; not collecting these stats
Improvement over 271 threadCount: 15%
Failed to connect over JMX; not collecting these stats
Improvement over 406 threadCount: 3%
Failed to connect over JMX; not collecting these stats
Improvement over 609 threadCount: -3%
Is there any command-line or file-based configuration parameter that I need to specify for JMX? I have tested and confirmed that connectivity between the stress machine and my nodes is not the issue, because I was able to establish a connection between them via jmxsh.
Another issue with the output, which may or may not be related to the JMX error, is that is it missing some key parts. I'm quoting the sample output from this Datastax documentation page to show the parts that are missing from what I got:
WARNING: uncertainty mode (err<) results in uneven workload between thread runs, so should be used for high level analysis only
Running with 4 threadCount
Running WRITE with 4 threads until stderr of mean < 0.02
total ops , adj row/s, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, gc: #, max ms, sum ms, sdv ms, mb
2552 , 2553, 2553, 2553, 2553, 1.5, 1.4, 2.5, 6.0, 12.6, 18.0, 1.0, 0.00000, 0, 0, 0, 0, 0
5173 , 2634, 2613, 2613, 2613, 1.5, 1.5, 1.8, 2.6, 8.6, 9.2, 2.0, 0.00000, 0, 0, 0, 0, 0
...
Results:
op rate : 3954
partition rate : 3954
row rate : 3954
latency mean : 1.0
latency median : 0.8
latency 95th percentile : 1.5
latency 99th percentile : 1.8
latency 99.9th percentile : 2.2
latency max : 73.6
total gc count : 25
total gc mb : 1826
total gc time (s) : 1
avg gc time(ms) : 37
stdev gc time(ms) : 10
Total operation time : 00:00:59
Sleeping for 15s
Running with 4 threadCount
Notes
I have the same setup (Cassandra version is 2.0.12) and the stress tool is from 2.1 and saw similar issues. Finally I had some time to investigate.
I downloaded the source code and ran it in the debugger. What I saw is that this error message is misleading. The tool connects to JMX but has problem with one of the mBeans (org.apache.cassandra.service:type=GCInspector
).
I saw the same exeption when I ran the stress test with the option: -log level=verbose
and saw the following Exception:
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy11.getAndResetStats(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.getAndResetGCStats(NodeProbe.java:385)
at org.apache.cassandra.stress.util.JmxCollector.<init>(JmxCollector.java:86)
at org.apache.cassandra.stress.StressMetrics.<init>(StressMetrics.java:64)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:187)
at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:97)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:61)
at org.apache.cassandra.stress.Stress.main(Stress.java:109)
Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.service:type=GCInspector
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(Unknown Source)
at ....
I connected to Cassandra using jConsole and version 2.0.12 does not have this mBean.
But my output has most of the data cited in the sample (except of the garbage collection statistics).
Have you tried running cassandra-stress with default configuration? Also try setting verbose for logging, may be it will give you some ideas.
java.lang.RuntimeException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exce4; nested exception is:
java.net.ConnectException: Connection timed out]
at org.apache.cassandra.stress.util.JmxCollector.connect(JmxCollector.java:99)
at org.apache.cassandra.stress.util.JmxCollector.(JmxCollector.java:85)
at org.apache.cassandra.stress.StressMetrics.(StressMetrics.java:62)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:211)
at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:107)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:60)
at org.apache.cassandra.stress.Stress.run(Stress.java:133)
at org.apache.cassandra.stress.Stress.main(Stress.java:61)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmion is:
java.net.ConnectException: Connection timed out]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:188)
at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:155)
at org.apache.cassandra.stress.util.JmxCollector.connect(JmxCollector.java:95)
... 7 more
Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 1.2.3.4;
java.net.ConnectException: Connection timed out]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:122)
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205)
at javax.naming.InitialContext.lookup(InitialContext.java:417)
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1957)
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1924)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
... 11 more
Caused by: java.rmi.ConnectException: Connection refused to host: 1.2.3.4; nested exception is:
java.net.ConnectException: Connection timed out
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:342)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118)
... 16 more
So to resolve this issue, I have set my rpc_address property from Cassandra.yaml file to <host_ip> and commented the broadcast_rpc_address property.
This works for me and I am not getting that error anymore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With