I am using spark-1.5.0-cdh5.6.0. tried the sample application (scala) command is:
> spark-submit --class com.cloudera.spark.simbox.sparksimbox.WordCount --master local /home/hadoop/work/testspark.jar
Got the following error:
ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/user/spark/applicationHistory does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:424)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
at com.cloudera.spark.simbox.sparksimbox.WordCount$.main(WordCount.scala:12)
at com.cloudera.spark.simbox.sparksimbox.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Initializing Spark To create a SparkContext you first need to build a SparkConf object that contains information about your application. Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one.
Spark shell is referred as REPL (Read Eval Print Loop) which is used to quickly test Spark/PySpark statements. The Spark Shell supports only Scala, Python and R (Java might be supported in previous versions). The spark-shell command is used to launch Spark with Scala shell.
Spark has a feature called "history server" which allows you to browse historical events after the SparkContext
dies. This property is set via setting spark.eventLog.enabled
to true
.
You have two options, either specify a valid directory to store the event log via the spark.eventLog.dir
config value, or simply set spark.eventLog.enabled
to false
if you don't need it.
You can read more on that in the Spark Configuration page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With