What's the difference between spark.eventLog.dir and spark.history.fs.logDirectory?

Tags:

apache-spark

In Spark, what is the difference between the event log directory and the history server log directory?

spark.eventLog.dir hdfs:///var/log/spark/apps
spark.history.fs.logDirectory hdfs:///var/log/spark/apps

858

asked Aug 14 '15 02:08

Ranjit Iyer

1 Answers

spark.eventLog.dir is to generate logs while spark.history.fs.logDirectory is the place where Spark History Server finds log events.

From the official documentation of Apache Spark:

spark.eventLog.dir is the base directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server.

See spark.eventLog.dir.

spark.history.fs.logDirectory is for the filesystem history provider, the URL to the directory containing application event logs to load. This can be a local file:// path, an HDFS path hdfs://namenode/shared/spark-logs or that of an alternative filesystem supported by the Hadoop APIs.

See spark.history.fs.logDirectory.

answered Oct 07 '22 17:10

enrique-carbonell

Related questions
                            
                                What is the correct way to start/stop spark streaming jobs in yarn?
                            
                                Spark Java Error: Size exceeds Integer.MAX_VALUE
                            
                                Dealing with a large gzipped file in Spark
                            
                                Extract document-topic matrix from Pyspark LDA Model
                            
                                local class incompatible Exception: when running spark standalone from IDE
                            
                                Disadvantages of Spark Dataset over DataFrame
                            
                                Why spark.ml don't implement any of spark.mllib algorithms?
                            
                                Preserve index-string correspondence spark string indexer
                            
                                How can set the default spark logging level?
                            
                                Meaning of Apache Spark warning "Calling spill() on RowBasedKeyValueBatch"
                            
                                Why is dataset.count causing a shuffle! (spark 2.2)
                            
                                Extract information from a `org.apache.spark.sql.Row`
                            
                                What is the right way to save\load models in Spark\PySpark
                            
                                How to run independent transformations in parallel using PySpark?
                            
                                How to limit functions.collect_set in Spark SQL?
                            
                                Airflow SparkSubmitOperator - How to spark-submit in another server
                            
                                Why does Spark RDD partition has 2GB limit for HDFS?
                            
                                How to mount S3 bucket on Kubernetes container/pods?
                            
                                Why spark application fail with "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?
                            
                                spark ssc.textFileStream is not streamining any files from directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With