Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Configuration on Windows through Cygwin

I am trying to configure Hadoop on my Windows 7 machine. I am able to start name node and other services, but as I am running an example which comes with the Hadoop package (version 1.0.3), following error is coming:

bin/hadoop: line 320 : C:\Program: Command not found. 

I ran the example with the following command:

bin/hadoop jar hadoop-examples-1.0.3.jar pi 10

I opened this hadoop file in which error is coming and found that in line 320 a path is being generated:

JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`

So I am feeling that problem might be in this JAVA variable as Cygwin uses different conventions for path name. Has anyone also faced this problem or know what is causing the problem?

like image 947
Manish Avatar asked Sep 11 '12 21:09

Manish


1 Answers

Quick summary:

  • The hadoop bash script under (path)/bin/hadoop actually has a bug in it. The script assumes that none of the files / paths that hadoop needs will have spaces in them. Well, for anything Windows, they will all have a space somewhere, since "Program Files" has a space in it.

Details

This is a tricky one... I ran into the same problem and it took me a while to fix.

First, the problem: setting environment variables via scripts can get sketchy when spaces are involved in the file paths / names (which occurs fairly often in non-*nix systems these days).

Next, there are likely two places where you need to fix the problem:

  1. In your (path)/conf/hadoop-env.sh script, you should be setting the JAVA_HOME script, and it SHOULD look something like:

    export JAVA_HOME=/cygdrive/c/"Program Files"/Java/jdk1.7.0_06
    

    (Note that there are quotation marks around the "Program Files", so that it is recognized as a single element. You cannot use the \ escape character because cygwin does some finagling of Windows to UNIX paths, so the \ cannot act as escape.

  2. In your (path)/bin/hadoop script, line 320 is likely written something like the following:

    JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`
    

    You will need to change it to instead say:

    JAVA_PLATFORM=`CLASSPATH="${CLASSPATH}" "${JAVA}" -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`
    

    Note that I have added quotation marks around the environment variables ${CLASSPATH} and ${JAVA}. By putting the quotation marks around it, you are saying that "the entire set of characters specified by this variable should be considered one string object".


OK, now if you care to understand why this is happening and what's going on, the problem is that your JDK is likely stored under "Program Files", or maybe under "Program Files (x86)", both of which have spaces within the path. All the other environment variables that Hadoop needs are not dependent upon anything within the "Program Files" pathway. So that's why you only see the one error being flagged. All the other environment variables which are missing the quotes simply don't have spaces within them.

like image 189
Mike Williamson Avatar answered Oct 12 '22 13:10

Mike Williamson