I have setup Spark2.1.1 cluster (1 master 2 slaves) following http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/ in standalone mode. I do not have a pre-Hadoop setup on of the machine. I wanted to start spark-history server. I run it as follows:
roshan@bolt:~/spark/spark_home/sbin$ ./start-history-server.sh
and in the spark-defaults.conf I set this:
spark.eventLog.enabled true
But it fails with the error:
7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(roshan); groups with view permissions: Set(); users with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)
What I should set to spark.history.fs.logDirectory
and spark.eventLog.dir
Update 1:
spark.eventLog.enabled true
spark.history.fs.logDirectory file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir file:////home/roshan/spark/spark_home/logs
but I am always getting this error:
java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co
To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor. You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.
Spark's standalone mode offers a web-based user interface to monitor the cluster. The master and each worker has its own web UI that shows cluster and job statistics. By default, you can access the web UI for the master at port 8080. The port can be changed either in the configuration file or via command-line options.
The Spark History Server is a User Interface that is used to monitor the metrics and performance of the completed Spark applications, In this article, I will explain what is history server? how to enable it to collect the even log, starting the server, and finally access and navigate the Interface. What is Spark History Server?
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself. Starting a Cluster Manually You can start a standalone master server by executing:
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself. You can start a standalone master server by executing:
Spark History server can keep the history of event logs for the following In order to store event logs for all submitted applications, first, Spark needs to collect the information while applications are running. By default, the spark doesn’t collect event log information. You can enable this by setting the below configs on spark-defaults.conf
By default spark defines file:/tmp/spark-events
as the log directory for history server and your log clearly says spark.history.fs.logDirectory is not configured
first of all you need to create spark-events folder in /tmp (which is not a good idea as /tmp is refreshed everytime a machine is rebooted) and then add spark.history.fs.logDirectory in spark-defaults.conf to point to that directory. But I suggest you create another folder which spark user has access to and update spark-defaults.conf file.
You need to define two more variables in spark-defaults.conf file
spark.eventLog.dir file:path to where you want to store your logs
spark.history.fs.logDirectory file:same path as above
Suppose you want to store in /opt/spark-events where spark user has access to then above parameters in spark-defaults.conf would be
spark.eventLog.enabled true
spark.eventLog.dir file:/opt/spark-events
spark.history.fs.logDirectory file:/opt/spark-events
You can find more information in Monitoring and Instrumentation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With