Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find JAR: /home/hadoop/contrib/streaming/hadoop-streaming.jar

I'm practicing a video tutorial from plural sight about Amazon EMR. I am stuck as i cannot proceed as i am getting this error

Not a valid JAR: /home/hadoop/contrib/streaming/hadoop-streaming.jar

Please note that tutorial is old and it is using a older Emr version. I am using the latest version is that a problem ?

The steps that i took are after entering the credentials in putty

1) Hadoop

2) mkdir streamingCode`

3) wget -o ./streamingCode/wordSplitter.py s3://elasticmapreduce/samples/wordcount/wordSplitter.py

4) hadoop jar contrib/streaming/hadoop-streaming.jar -files streamingCode/wordSplitter.py -mapper wordSplitter.py input s3://elasticmapreduce/samples/wordcount/input -output streamingCode/wordCountOut -reducer aggregate`

I cannot execute step 4 as i am getting the below error

Not a valid JAR: /home/hadoop/contrib/streaming/hadoop-streaming.jar

like image 805
harshil bhatt Avatar asked Sep 12 '15 21:09

harshil bhatt


People also ask

Where can I find Hadoop streaming jar?

you can find streaming jar in /usr/hdp/current/hadoop-mapreduce-client, make sure mapreduce, hdfs and yarn clients are installed on your machine. you can find streaming jar in /usr/hdp/current/hadoop-mapreduce-client, make sure mapreduce, hdfs and yarn clients are installed on your machine.

How do I run a jar file in Hadoop?

For this you need to add a package name to your . java file according to the directory structure , for example home. hduser. dir and while running the hadoop jar command specify the class name with the package structure, for example home.

Which is the tool of Hadoop streaming data transfer?

Which is the tool of Hadoop streaming data transfer? Apache Flume – Data Transfer In Hadoop.

What is the mapper used in Hadoop streaming?

Let us now see how Hadoop Streaming works. The mapper and the reducer (in the above example) are the scripts that read the input line-by-line from stdin and emit the output to stdout. The utility creates a Map/Reduce job and submits the job to an appropriate cluster and monitor the job progress until its completion.


2 Answers

The Hadoop streaming jar is still available in the latest release of EMR Hadoop. Starting with EMR release 4.0.0 it can be found at /usr/lib/hadoop-mapreduce/hadoop-streaming.jar.

Another good resource for differences between versions can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html.

like image 181
ChristopherB Avatar answered Sep 21 '22 12:09

ChristopherB


For the variable, HADOOP_STREAMING, obtaining the path is a bit more complicated depending on the HDP you are using.

Search for where it is located via command: find / -name 'hadoop-streaming*.jar'

Src: http://thecoatlessprofessor.com/programming/installing-r-studio-server-on-hortonworks-virtual-box-image-and-rmr2-a-k-a-rhadoop-r-package/

like image 23
Nikhil B Agarwal Avatar answered Sep 18 '22 12:09

Nikhil B Agarwal