Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

winutils spark windows installation env_variable

I am trying to install Spark 1.6.1 on windows 10 and so far I have done the following...

  1. Downloaded spark 1.6.1, unpacked to some directory and then set SPARK_HOME
  2. Downloaded scala 2.11.8, unpacked to some directory and then set SCALA_HOME
  3. Set the _JAVA_OPTION env variable
  4. Downloaded the winutils from https://github.com/steveloughran/winutils.git by just downloading the zip directory and then set HADOOP_HOME env variable. (Not sure if this was incorrect, I could not clone the directory because of permission denied).

When I go to spark home and run bin\spark-shell I get

'C:\Program' is not recognized as an internal or external command, operable program or batch file.

I must be missing something, I don't see how I could be running the bash scripts anyway from windows environment. But hopefully I don't need to understand just to get this working. I have been following this guy's tutorial - https://hernandezpaul.wordpress.com/2016/01/24/apache-spark-installation-on-windows-10/ . Any help would be appreciated.

like image 361
uh_big_mike_boi Avatar asked May 18 '16 16:05

uh_big_mike_boi


People also ask

How do I use winutils with spark?

In order for Spark to use the WinUtils executable, you should create a local directory with a ‘\bin’ subdirectory as suggested below: Copy the winutils.exe, libwinutils.lib, hadoop.dll and hadoop.lib files files generated earlier to this destination.

How do I set environment variables in spark?

We have to setup below environment variables to let spark know where the required files are. In Start Menu type ‘Environment Variables’ and select ‘ Edit the system environment variables ’ Click on ‘ Environment Variables… ’ Find it in ‘Program Files’ or ‘Program Files (x86)’ based on which version was installed above.

What is wunutils Exe in spark?

winutils.exe enables Spark to use Windows-specific services including running shell commands on a windows environment. Download wunutils.exe for Hadoop 2.7 and copy it to %SPARK_HOME%\bin folder.

How to run Apache Spark on Windows?

To run Apache Spark on windows, you need winutils.exe as it uses POSIX like file access operations in windows using windows API. winutils.exe enables Spark to use Windows-specific services including running shell commands on a windows environment. Download wunutils.exe for Hadoop 2.7 and copy it to %SPARK_HOME%\bin folder.


1 Answers

You need to download the winutils executable, not source code.

You can download it here, or if you really want the entire Hadoop distribution you can find the 2.6.0 binaries here. Then, you need to set HADOOP_HOME to the directory containing winutils.exe.

Also, make sure the directory you place Spark in is a directory that doesn't contain whitespaces, this is extremely important otherwise it won't work.

Once you've set it up, you don't start spark-shell.sh, you start spark-shell.cmd:

C:\Spark\bin>spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:01 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/05/18 19:32:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
16/05/18 19:32:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:12 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/05/18 19:32:12 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.

scala>
like image 194
Yuval Itzchakov Avatar answered Oct 16 '22 19:10

Yuval Itzchakov