Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between spark.eventLog.dir and spark.history.fs.logDirectory?

Tags:

apache-spark

In Spark, what is the difference between the event log directory and the history server log directory?

spark.eventLog.dir hdfs:///var/log/spark/apps
spark.history.fs.logDirectory hdfs:///var/log/spark/apps
like image 858
Ranjit Iyer Avatar asked Aug 14 '15 02:08

Ranjit Iyer


People also ask

What is Spark history?

The Spark history server is a Web UI where you can view the status of running and completed Spark jobs on a provisioned instance of Analytics Engine powered by Apache Spark. If you want to analyse how different stages of your Spark job performed, you can view the details in the Spark history server UI.

How does Spark History server work?

The Spark History server provides application history from event logs stored in the file system. It periodically checks in the background for applications that have finished and renders a UI to show the history of applications by parsing the associated event logs.

How are we monitoring batch and checking logs in Spark?

Spark keeps a history of every application you run by creating a sub-directory for each application and logs the events specific to the application in this directory. You can also set the location like an HDFS directory so history files can be read by the history server.


1 Answers

spark.eventLog.dir is to generate logs while spark.history.fs.logDirectory is the place where Spark History Server finds log events.

From the official documentation of Apache Spark:

spark.eventLog.dir is the base directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server.

See spark.eventLog.dir.

spark.history.fs.logDirectory is for the filesystem history provider, the URL to the directory containing application event logs to load. This can be a local file:// path, an HDFS path hdfs://namenode/shared/spark-logs or that of an alternative filesystem supported by the Hadoop APIs.

See spark.history.fs.logDirectory.

like image 85
enrique-carbonell Avatar answered Oct 07 '22 17:10

enrique-carbonell