<p>I would like to set <code>spark.eventLog.enabled</code> and <code>spark.eventLog.dir</code> at the <code>spark-submit</code> or <code>start-all</code> level -- not require it to be enabled in the scala/java/python code. I have tried various things with no success:</p> <h3>Setting <code>spark-defaults.conf</code> as</h3> <pre class="prettyprint"><code>spark.eventLog.enabled true spark.eventLog.dir hdfs://namenode:8021/directory </code></pre> <p>or</p> <pre class="prettyprint"><code>spark.eventLog.enabled true spark.eventLog.dir file:///some/where </code></pre> <h3>Running <code>spark-submit</code> as:</h3> <pre class="prettyprint"><code>spark-submit --conf "spark.eventLog.enabled=true" --conf "spark.eventLog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py </code></pre> <h3>Starting spark with environment variables:</h3> <pre class="prettyprint"><code>SPARK_DAEMON_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d" </code></pre> <p>and just for overkill:</p> <pre class="prettyprint"><code>SPARK_HISTORY_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d" </code></pre> <p>Where and how must these things be set to get history on arbitrary jobs?</p>

<p>I solved the problem, yet strangely I had tried this before... All the same, now it seems like a stable solution:</p> <p>Create a directory in <code>HDFS</code> for logging, say <code>/eventLogging</code></p> <pre class="prettyprint"><code>hdfs dfs -mkdir /eventLogging </code></pre> <p>Then <code>spark-shell</code> or <code>spark-submit</code> (or whatever) can be run with the following options: </p> <pre class="prettyprint"><code>--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging </code></pre> <p>such as:</p> <pre class="prettyprint"><code>spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging </code></pre>

Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start

Tags:

apache-spark

I would like to set spark.eventLog.enabled and spark.eventLog.dir at the spark-submit or start-all level -- not require it to be enabled in the scala/java/python code. I have tried various things with no success:

Setting `spark-defaults.conf` as

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://namenode:8021/directory

spark.eventLog.enabled           true
spark.eventLog.dir               file:///some/where

Running `spark-submit` as:

spark-submit --conf "spark.eventLog.enabled=true" --conf "spark.eventLog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py

Starting spark with environment variables:

SPARK_DAEMON_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"

and just for overkill:

SPARK_HISTORY_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"

Where and how must these things be set to get history on arbitrary jobs?

435

asked Jul 05 '15 18:07

SpmP

1 Answers

I solved the problem, yet strangely I had tried this before... All the same, now it seems like a stable solution:

Create a directory in HDFS for logging, say /eventLogging

hdfs dfs -mkdir /eventLogging

Then spark-shell or spark-submit (or whatever) can be run with the following options:

--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging

such as:

spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging

197

answered Oct 08 '22 02:10

SpmP

Related questions
                            
                                Get JavaSparkContext from a SparkSession
                            
                                spark - scala - How can I check if a table exists in hive
                            
                                How to add multiple columns using UDF?
                            
                                Sampling a large distributed data set using pyspark / spark
                            
                                Spark-Obtaining file name in RDDs
                            
                                Spark SQL broadcast hash join
                            
                                Why would I want .union over .unionAll in Spark for SchemaRDDs?
                            
                                Spark textFile vs wholeTextFiles
                            
                                Spark off heap memory leak on Yarn with Kafka direct stream
                            
                                Slow Performance with Apache Spark Gradient Boosted Tree training runs
                            
                                Why does Spark task take a long time to find block locally?
                            
                                How to evaluate a classifier with PySpark 2.4.5
                            
                                How to set preferences for ALS implicit feedback in Collaborative Filtering?
                            
                                Spark execution memory monitoring [closed]
                            
                                Writing more than 50 millions from Pyspark df to PostgresSQL, best efficient approach
                            
                                Spark: Writing to Avro file
                            
                                Apache Spark: pyspark crash for large dataset
                            
                                Understanding Spark's closures and their serialization
                            
                                apache spark MLLib: how to build labeled points for string features?
                            
                                How to suppress parquet log messages in Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start

Tags:

apache-spark

Setting `spark-defaults.conf` as

Running `spark-submit` as:

Starting spark with environment variables:

SpmP

People also ask

1 Answers

SpmP

Recent Activity

Donate For Us

Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start

Tags:

apache-spark

Setting spark-defaults.conf as

Running spark-submit as:

Starting spark with environment variables:

SpmP

People also ask

1 Answers

SpmP

Related questions

Recent Activity

Donate For Us

Setting `spark-defaults.conf` as

Running `spark-submit` as: