I'm not able to run a simple spark
job in Scala IDE
(Maven spark project) installed on Windows 7
Spark core dependency has been added.
val conf = new SparkConf().setAppName("DemoDF").setMaster("local") val sc = new SparkContext(conf) val logData = sc.textFile("File.txt") logData.count()
Error:
16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13 16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br> at scala.Option.map(Option.scala:145)<br> at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br> at scala.Option.getOrElse(Option.scala:120)<br> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br> at scala.Option.getOrElse(Option.scala:120)<br> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br> at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br> at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br> at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>
What Does Spark Need WinUtils For? In order to run Apache Spark locally, it is required to use an element of the Hadoop code base known as 'WinUtils'. This allows management of the POSIX file system permissions that the HDFS file system requires of the local file system.
Here is a good explanation of your problem with the solution.
Download the version of winutils.exe from https://github.com/steveloughran/winutils.
Set up your HADOOP_HOME
environment variable on the OS level or programmatically:
System.setProperty("hadoop.home.dir", "full path to the folder with winutils");
Enjoy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With