I have been trying to get a Spark Streaming job, running on a EC2 instance to report to VisualVM using JMX.
As of now I have the following config file:
spark/conf/metrics.properties:
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
And I start the spark streaming job like this: (the -D bits I have added afterwards in the hopes of getting remote access to the ec2's jmx)
terminal:
spark/bin/spark-submit --class my.class.StarterApp --master local --deploy-mode client \
project-1.0-SNAPSHOT.jar \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=54321 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false
There are two issues with the spark-submit
command line:
local
- you must not run Spark Standalone with local
master URL because there will be no threads to run your computations (jobs) and you've got two, i.e. one for a receiver and another for the driver. You should see the following WARN in the logs:WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
-D
options are not picked up by the JVM as they're given after the Spark Streaming application and effectively became its command-line arguments. Put them before project-1.0-SNAPSHOT.jar
and start over (you have to fix the above issue first!)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With