Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to enable spark-history server for standalone cluster non hdfs mode

I have setup Spark2.1.1 cluster (1 master 2 slaves) following http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/ in standalone mode. I do not have a pre-Hadoop setup on of the machine. I wanted to start spark-history server. I run it as follows:

roshan@bolt:~/spark/spark_home/sbin$ ./start-history-server.sh

and in the spark-defaults.conf I set this:

spark.eventLog.enabled           true

But it fails with the error:

7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(roshan); groups with view permissions: Set(); users  with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
    at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)

What I should set to spark.history.fs.logDirectory and spark.eventLog.dir

Update 1:

spark.eventLog.enabled           true
spark.history.fs.logDirectory   file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir               file:////home/roshan/spark/spark_home/logs

but I am always getting this error:

java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co
like image 289
Roshan Mehta Avatar asked Jun 29 '17 21:06

Roshan Mehta


People also ask

How do I run Spark in standalone cluster mode?

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor. You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.

What is standalone mode in Spark?

Spark's standalone mode offers a web-based user interface to monitor the cluster. The master and each worker has its own web UI that shows cluster and job statistics. By default, you can access the web UI for the master at port 8080. The port can be changed either in the configuration file or via command-line options.

What is spark history server and how to use it?

The Spark History Server is a User Interface that is used to monitor the metrics and performance of the completed Spark applications, In this article, I will explain what is history server? how to enable it to collect the even log, starting the server, and finally access and navigate the Interface. What is Spark History Server?

How do I start a Spark cluster in standalone mode?

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself. Starting a Cluster Manually You can start a standalone master server by executing:

How do I install spark on a standalone server?

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself. You can start a standalone master server by executing:

How to keep event logs for all submitted applications in spark?

Spark History server can keep the history of event logs for the following In order to store event logs for all submitted applications, first, Spark needs to collect the information while applications are running. By default, the spark doesn’t collect event log information. You can enable this by setting the below configs on spark-defaults.conf


1 Answers

By default spark defines file:/tmp/spark-events as the log directory for history server and your log clearly says spark.history.fs.logDirectory is not configured

first of all you need to create spark-events folder in /tmp (which is not a good idea as /tmp is refreshed everytime a machine is rebooted) and then add spark.history.fs.logDirectory in spark-defaults.conf to point to that directory. But I suggest you create another folder which spark user has access to and update spark-defaults.conf file.

You need to define two more variables in spark-defaults.conf file

spark.eventLog.dir              file:path to where you want to store your logs
spark.history.fs.logDirectory   file:same path as above

Suppose you want to store in /opt/spark-events where spark user has access to then above parameters in spark-defaults.conf would be

spark.eventLog.enabled          true
spark.eventLog.dir              file:/opt/spark-events
spark.history.fs.logDirectory   file:/opt/spark-events

You can find more information in Monitoring and Instrumentation

like image 58
Ramesh Maharjan Avatar answered Sep 28 '22 13:09

Ramesh Maharjan