Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to report JMX from Spark Streaming on EC2 to VisualVM?

I have been trying to get a Spark Streaming job, running on a EC2 instance to report to VisualVM using JMX.

As of now I have the following config file:

spark/conf/metrics.properties:

*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

And I start the spark streaming job like this: (the -D bits I have added afterwards in the hopes of getting remote access to the ec2's jmx)

terminal:

spark/bin/spark-submit --class my.class.StarterApp --master local --deploy-mode client \
  project-1.0-SNAPSHOT.jar \
    -Dcom.sun.management.jmxremote \
    -Dcom.sun.management.jmxremote.port=54321 \
    -Dcom.sun.management.jmxremote.authenticate=false \
    -Dcom.sun.management.jmxremote.ssl=false
like image 870
Havnar Avatar asked Sep 30 '22 04:09

Havnar


1 Answers

There are two issues with the spark-submit command line:

  1. local - you must not run Spark Standalone with local master URL because there will be no threads to run your computations (jobs) and you've got two, i.e. one for a receiver and another for the driver. You should see the following WARN in the logs:

WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.

  1. -D options are not picked up by the JVM as they're given after the Spark Streaming application and effectively became its command-line arguments. Put them before project-1.0-SNAPSHOT.jar and start over (you have to fix the above issue first!)
like image 136
Jacek Laskowski Avatar answered Oct 08 '22 12:10

Jacek Laskowski