Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Apache Hadoop 2.1.0 on Windows

Tags:

windows

hadoop

I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.

Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z     at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)     at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)     at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)     at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)     at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)     at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200) ... 13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1 

Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

Thank you.

UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With this files namenode and datanode started successfully. I think the question can be closed. I didn't delete it in case someone face the same difficulty.

UPDATED 2. To build the "homebrew" package I did the following:

  1. Got sources, and unpacked them.
  2. Read carefully BUILDING.txt.
  3. Installed dependencies:
    3a) Windows SDK 7.1
    3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
    3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
    3e) A set of UNIX command line tools (I installed Cygwin)
  4. Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
  5. Set up the environment:

    set JAVA_HOME={path_to_JDK_root}

It seems that JAVA_HOME MUST NOT contain space!

set PATH={path_to_maven_bin};%PATH%   set Platform=x64   set PATH={path_to_cygwin_bin};%PATH%   set PATH={path_to_protoc.exe};%PATH%   
  1. Changed dir to sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have short name - I used D:\hds)
  2. Ran building process:

    mvn package -Pdist -DskipTests

You can try without 'skipTests' but on my machine some tests failed and building was terminated. It may be connected to sybolic link issues mentioned in BUILDING .txt. 8. Picked the result in hadoop-dist\target\hadoop-2.1.0-beta (windows executables and dlls are in 'bin' folder)

like image 919
Hatter Avatar asked Sep 05 '13 07:09

Hatter


1 Answers

I have followed following steps to install Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows

  1. Download and install Microsoft Windows SDK v7.1.

  2. Download and install Unix command-line tool Cygwin.

  3. Download and install Maven 3.1.1.

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

  5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

Install Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

Configure Hadoop

C:\hadoop\etc\hadoop\core-site.xml

<configuration>         <property>                 <name>fs.defaultFS</name>                 <value>hdfs://localhost:9000</value>         </property> </configuration> 

C:\hadoop\etc\hadoop\hdfs-site.xml

<configuration>         <property>                 <name>dfs.replication</name>                 <value>1</value>         </property>         <property>                 <name>dfs.namenode.name.dir</name>                 <value>file:/hadoop/data/dfs/namenode</value>         </property>         <property>                 <name>dfs.datanode.data.dir</name>                 <value>file:/hadoop/data/dfs/datanode</value>         </property> </configuration> 

C:\hadoop\etc\hadoop\mapred-site.xml

<configuration>         <property>            <name>mapreduce.framework.name</name>            <value>yarn</value>         </property> </configuration> 

C:\hadoop\etc\hadoop\ yarn-site.xml

<configuration>         <property>            <name>yarn.nodemanager.aux-services</name>            <value>mapreduce_shuffle</value>         </property>         <property>            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>            <value>org.apache.hadoop.mapred.ShuffleHandler</value>         </property> </configuration> 

Format namenode

For the first time only, namenode needs to be formatted.

C:\Users\abhijitg>cd c:\hadoop\bin  c:\hadoop\bin>hdfs namenode –format 

Start HDFS (Namenode and Datanode)

C:\Users\abhijitg>cd c:\hadoop\sbin c:\hadoop\sbin>start-dfs 

Start MapReduce aka YARN (Resource Manager and Node Manager)

C:\Users\abhijitg>cd c:\hadoop\sbin c:\hadoop\sbin>start-yarn starting yarn daemons 

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

like image 188
Abhijit Avatar answered Oct 02 '22 07:10

Abhijit