Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7

Spark core dependency has been added.

val conf = new SparkConf().setAppName("DemoDF").setMaster("local") val sc = new SparkContext(conf) val logData = sc.textFile("File.txt") logData.count() 

Error:

16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13 16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.     at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)     at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)     at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)     at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)     at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)     at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)     at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)     at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)     at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>     at scala.Option.map(Option.scala:145)<br>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>     at scala.Option.getOrElse(Option.scala:120)<br>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>     at scala.Option.getOrElse(Option.scala:120)<br>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>     at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>     at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>     at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br> 
like image 835
Elvish_Blade Avatar asked Feb 26 '16 13:02

Elvish_Blade


People also ask

What is Winutils spark?

What Does Spark Need WinUtils For? In order to run Apache Spark locally, it is required to use an element of the Hadoop code base known as 'WinUtils'. This allows management of the POSIX file system permissions that the HDFS file system requires of the local file system.


1 Answers

Here is a good explanation of your problem with the solution.

  1. Download the version of winutils.exe from https://github.com/steveloughran/winutils.

  2. Set up your HADOOP_HOME environment variable on the OS level or programmatically:

    System.setProperty("hadoop.home.dir", "full path to the folder with winutils");

  3. Enjoy

like image 59
Taky Avatar answered Oct 05 '22 07:10

Taky