Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Unable to run mapreduce wordcount

I am trying to teach myself some hadoop basics and so have build a simple hadoop cluster. This works and I can put, ls, cat from the hdfs filesystem without any issues. So I took the next step and tried to do a wordcount on a file I had put into hadoop, but I get the following error

 $ hadoop jar /home/hadoop/share/hadoop/mapreduce/*examples*.jar wordcount     data/sectors.txt results
2018-06-06 07:57:36,936 INFO client.RMProxy: Connecting to ResourceManager     at ansdb1/
2018-06-06 07:57:37,404 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1528191458385_0014
2018-06-06 07:57:37,734 INFO input.FileInputFormat: Total input files to process : 1
2018-06-06 07:57:37,869 INFO mapreduce.JobSubmitter: number of splits:1
2018-06-06 07:57:37,923 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-06-06 07:57:38,046 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528191458385_0014
2018-06-06 07:57:38,048 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-06-06 07:57:38,284 INFO conf.Configuration: resource-types.xml not found
2018-06-06 07:57:38,284 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-06-06 07:57:38,382 INFO impl.YarnClientImpl: Submitted application application_1528191458385_0014
2018-06-06 07:57:38,445 INFO mapreduce.Job: The url to track the job: http://ansdb1:8088/proxy/application_1528191458385_0014/
2018-06-06 07:57:38,446 INFO mapreduce.Job: Running job: job_1528191458385_0014
2018-06-06 07:57:45,499 INFO mapreduce.Job: Job job_1528191458385_0014 running in uber mode : false
2018-06-06 07:57:45,501 INFO mapreduce.Job:  map 0% reduce 0%
2018-06-06 07:57:45,521 INFO mapreduce.Job: Job job_1528191458385_0014 failed with state FAILED due to: Application application_1528191458385_0014 failed 2 times due to AM Container for appattempt_1528191458385_0014_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2018-06-06 07:57:43.301]Exception from container-launch.
Container id: container_1528191458385_0014_02_000001
Exit code: 1

[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>

[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>

For more detailed output, check the application tracking page: http://ansdb1:8088/cluster/app/application_1528191458385_0014 Then click on links to logs of each attempt.
. Failing the application.
2018-06-06 07:57:45,558 INFO mapreduce.Job: Counters: 0

I have searched lots of website and they seem to say that my environment isn't right. I have tried many of the suggested fixes, but nothing has worked.

Everything is running on both nodes:

$ jps
31858 ResourceManager
31544 SecondaryNameNode
6152 Jps
31275 DataNode
31132 NameNode
$ ssh ansdb2 jps
16615 NodeManager
21290 Jps
16478 DataNode

I can ls hadoop:

$ hadoop fs -ls /
Found 3 items
drwxrwxrwt   - hadoop supergroup          0 2018-06-06 07:58 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-06-05 11:46 /user
drwxr-xr-x   - hadoop supergroup          0 2018-06-05 07:50 /usr

hadoop version:

$ hadoop version
Hadoop 3.1.0
Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
Compiled by centos on 2018-03-30T00:00Z
Compiled with protoc 2.5.0
From source with checksum 14182d20c972b3e2105580a1ad6990
This command was run using /home/hadoop/share/hadoop/common/hadoop-common-3.1.0.jar

hadoop classpath:

$ hadoop classpath

my environment is setup:

# hadoop
## JAVA env variables
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

## HADOOP env variables
export HADOOP_HOME=/home/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop


My hadoop xml files


$ cat $HADOOP_HOME/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


$ cat $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


$ cat $HADOOP_HOME/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>


$ cat $HADOOP_HOME/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

I have checked which jar file contains MRAppMaster:

$ find /home/hadoop -name '*.jar' -exec grep -Hls MRAppMaster {} \;

Clearly I am missing something, so could somebody please point me the right direction.

like image 796
Paul Sowerby Avatar asked Jun 06 '18 11:06

Paul Sowerby

People also ask

What is WordCount in MapReduce?

MapReduce Word Count is a framework which splits the chunk of data, sorts the map outputs and input to reduce tasks. A File-system stores the output and input of jobs. Re-execution of failed tasks, scheduling them and monitoring them is the task of the framework.

1 Answers

After much googling of the same question asked different ways, I found this https://mathsigit.github.io/blog_page/2017/11/16/hole-of-submitting-mr-of-hadoop300RC0/ (it's in Chinese). So I set the following properties in mapred-site.xml


And everything works

like image 124
Paul Sowerby Avatar answered Oct 27 '22 15:10

Paul Sowerby