Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark on yarn; how to send metrics to graphite sink?

I am new to spark and we are running spark on yarn. I can run my test applications just fine. I am trying to collect the spark metrics in Graphite. I know what changes to make to metrics.properties file. But how will my spark application see this conf file?

/xxx/spark/spark-0.9.0-incubating-bin-hadoop2/bin/spark-class org.apache.spark.deploy.yarn.Client --jar /xxx/spark/spark-0.9.0-incubating-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar --addJars "hdfs://host:port/spark/lib/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar" --class org.apache.spark.examples.Test --args yarn-standalone --num-workers 50 --master-memory 1024m --worker-memory 1024m --args "xx"

Where should I be specifying the metrics.properties file?

I made these changes to it:

*.sink.Graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.Graphite.host=machine.domain.com
*.sink.Graphite.port=2003

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
like image 716
user3614090 Avatar asked May 07 '14 22:05

user3614090


2 Answers

I have found a different solution to the same problem. It looks like that Spark can also take these metric settings from its config properties. For example the following line from metrics.properties:

*.sink.Graphite.class=org.apache.spark.metrics.sink.GraphiteSink

Can also be specified as a Spark property with key spark.metrics.conf.*.sink.graphite.class and value org.apache.spark.metrics.sink.GraphiteSink. You just need to prepend spark.metrics.conf. to each key.

I have ended up putting all these settings in the code like this:

val sparkConf = new spark.SparkConf()
 .set("spark.metrics.conf.*.sink.graphite.class", "org.apache.spark.metrics.sink.GraphiteSink")
 .set("spark.metrics.conf.*.sink.graphite.host", graphiteHostName)
// etc.  
val sc = new spark.SparkContext(sparkConf)

This way I've got the metrics sink set up for both the driver and the executors. I was using Spark 1.6.0.

like image 71
Gábor Fehér Avatar answered Nov 02 '22 06:11

Gábor Fehér


I struggled with the same thing. I have it working using these flags:

--files=/path/to/metrics.properties --conf spark.metrics.conf=metrics.properties

It's tricky because the --files flag makes it so your /path/to/metrics.properties file ends up in every executor's local disk space as metrics.properties; AFAIK there's no way to specify more complex directory structure there, or have two files with the same basename.

Related, I filed SPARK-5152 about letting the spark.metrics.conf file be read from HDFS, but that seems like it would require a fairly invasive change, so I'm not holding my breath on that one.

like image 24
Ryan Williams Avatar answered Nov 02 '22 05:11

Ryan Williams