How to enable spark-history server for standalone cluster non hdfs mode

Tags:

I have setup Spark2.1.1 cluster (1 master 2 slaves) following http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/ in standalone mode. I do not have a pre-Hadoop setup on of the machine. I wanted to start spark-history server. I run it as follows:

roshan@bolt:~/spark/spark_home/sbin$ ./start-history-server.sh

and in the spark-defaults.conf I set this:

spark.eventLog.enabled           true

But it fails with the error:

7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(roshan); groups with view permissions: Set(); users  with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
    at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)

What I should set to spark.history.fs.logDirectory and spark.eventLog.dir

Update 1:

spark.eventLog.enabled           true
spark.history.fs.logDirectory   file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir               file:////home/roshan/spark/spark_home/logs

but I am always getting this error:

java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co

289

asked Jun 29 '17 21:06

Roshan Mehta

1 Answers

By default spark defines file:/tmp/spark-events as the log directory for history server and your log clearly says spark.history.fs.logDirectory is not configured

first of all you need to create spark-events folder in /tmp (which is not a good idea as /tmp is refreshed everytime a machine is rebooted) and then add spark.history.fs.logDirectory in spark-defaults.conf to point to that directory. But I suggest you create another folder which spark user has access to and update spark-defaults.conf file.

You need to define two more variables in spark-defaults.conf file

spark.eventLog.dir              file:path to where you want to store your logs
spark.history.fs.logDirectory   file:same path as above

Suppose you want to store in /opt/spark-events where spark user has access to then above parameters in spark-defaults.conf would be

spark.eventLog.enabled          true
spark.eventLog.dir              file:/opt/spark-events
spark.history.fs.logDirectory   file:/opt/spark-events

You can find more information in Monitoring and Instrumentation

answered Sep 28 '22 13:09

Ramesh Maharjan

Related questions
                            
                                Why does transform do side effects (println) only once in Structured Streaming?
                            
                                Issues with Logistic Regression for multiclass classification using PySpark
                            
                                Need to Know Partitioning Details in Dataframe Spark
                            
                                Is Hive faster than Spark?
                            
                                How to use Spark-Scala to download a CSV file from the web?
                            
                                turning pandas to pyspark expression
                            
                                Zeppelin java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
                            
                                Apache Spark - Dataset operations fail in abstract base class?
                            
                                Sort by date an Array of a Spark DataFrame Column
                            
                                Scala + SBT - How to configure reference.conf for a shaded Akka library
                            
                                Processing (OSM) PBF files in Spark
                            
                                Using stat.bloomFilter in Spark 2.0.0 to filter another dataframe
                            
                                Spark SQL "Limit"
                            
                                spark-submit config through file
                            
                                Scala/ Spark- Multiply an Integer with each value in a Dataframe Column
                            
                                How to enable Tungsten optimization in Spark 2?
                            
                                Retrieve Spark Mllib StringIndexer column mapping
                            
                                Efficient way to join a cached spark dataframe with other and cache again
                            
                                Is it the driver or the workers who reads the text file when sc.textfile is used?
                            
                                maximum number of columns we can have in dataframe spark scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to enable spark-history server for standalone cluster non hdfs mode

Tags:

apache-spark

pyspark

Roshan Mehta

People also ask

1 Answers

Ramesh Maharjan

Recent Activity

Donate For Us