Unable to run mapreduce wordcount

Tags:

I am trying to teach myself some hadoop basics and so have build a simple hadoop cluster. This works and I can put, ls, cat from the hdfs filesystem without any issues. So I took the next step and tried to do a wordcount on a file I had put into hadoop, but I get the following error

 $ hadoop jar /home/hadoop/share/hadoop/mapreduce/*examples*.jar wordcount     data/sectors.txt results
2018-06-06 07:57:36,936 INFO client.RMProxy: Connecting to ResourceManager     at ansdb1/10.49.17.12:8040
2018-06-06 07:57:37,404 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1528191458385_0014
2018-06-06 07:57:37,734 INFO input.FileInputFormat: Total input files to process : 1
2018-06-06 07:57:37,869 INFO mapreduce.JobSubmitter: number of splits:1
2018-06-06 07:57:37,923 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-06-06 07:57:38,046 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528191458385_0014
2018-06-06 07:57:38,048 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-06-06 07:57:38,284 INFO conf.Configuration: resource-types.xml not found
2018-06-06 07:57:38,284 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-06-06 07:57:38,382 INFO impl.YarnClientImpl: Submitted application application_1528191458385_0014
2018-06-06 07:57:38,445 INFO mapreduce.Job: The url to track the job: http://ansdb1:8088/proxy/application_1528191458385_0014/
2018-06-06 07:57:38,446 INFO mapreduce.Job: Running job: job_1528191458385_0014
2018-06-06 07:57:45,499 INFO mapreduce.Job: Job job_1528191458385_0014 running in uber mode : false
2018-06-06 07:57:45,501 INFO mapreduce.Job:  map 0% reduce 0%
2018-06-06 07:57:45,521 INFO mapreduce.Job: Job job_1528191458385_0014 failed with state FAILED due to: Application application_1528191458385_0014 failed 2 times due to AM Container for appattempt_1528191458385_0014_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2018-06-06 07:57:43.301]Exception from container-launch.
Container id: container_1528191458385_0014_02_000001
Exit code: 1

[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>

[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>

For more detailed output, check the application tracking page: http://ansdb1:8088/cluster/app/application_1528191458385_0014 Then click on links to logs of each attempt.
. Failing the application.
2018-06-06 07:57:45,558 INFO mapreduce.Job: Counters: 0

I have searched lots of website and they seem to say that my environment isn't right. I have tried many of the suggested fixes, but nothing has worked.

Everything is running on both nodes:

$ jps
31858 ResourceManager
31544 SecondaryNameNode
6152 Jps
31275 DataNode
31132 NameNode
$ ssh ansdb2 jps
16615 NodeManager
21290 Jps
16478 DataNode

I can ls hadoop:

$ hadoop fs -ls /
Found 3 items
drwxrwxrwt   - hadoop supergroup          0 2018-06-06 07:58 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-06-05 11:46 /user
drwxr-xr-x   - hadoop supergroup          0 2018-06-05 07:50 /usr

hadoop version:

$ hadoop version
Hadoop 3.1.0
Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
Compiled by centos on 2018-03-30T00:00Z
Compiled with protoc 2.5.0
From source with checksum 14182d20c972b3e2105580a1ad6990
This command was run using /home/hadoop/share/hadoop/common/hadoop-common-3.1.0.jar

hadoop classpath:

$ hadoop classpath
/home/hadoop/etc/hadoop:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/*:/home/hadoop/share/hadoop/hdfs:/home/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/share/hadoop/hdfs/*:/home/hadoop/share/hadoop/mapreduce/*:/home/hadoop/share/hadoop/yarn:/home/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/share/hadoop/yarn/*

my environment is setup:

# hadoop
## JAVA env variables
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

## HADOOP env variables
export HADOOP_HOME=/home/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=$HADOOP_HOME

PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

My hadoop xml files

core-site.xml:

$ cat $HADOOP_HOME/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://ansdb1:9000/</value>
 </property>
</configuration>

hdfs-site.xml:

$ cat $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>dfs.data.dir</name>
  <value>/data/hadoop/datanode</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>/data/hadoop/namenode</value>
 </property>
 <property>
  <name>dfs.checkpoint.dir</name>
  <value>/data/hadoop/secondarynamenode</value>
 </property>
 <property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
</configuration>

yarn-site.xml:

$ cat $HADOOP_HOME/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
 <property>
  <name>yarn.resourcemanager.hostname</name>
  <value>ansdb1</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>ansdb1:8025</value>
 </property>
 <property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>ansdb1:8030</value>
 </property>
 <property>
  <name>yarn.resourcemanager.address</name>
  <value>ansdb1:8040</value>
 </property>
</configuration>

mapred-site.xml:

$ cat $HADOOP_HOME/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
 </property>
</configuration>

I have checked which jar file contains MRAppMaster:

$ find /home/hadoop -name '*.jar' -exec grep -Hls MRAppMaster {} \;
/home/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-client-app-3.1.0-sources.jar
/home/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-client-app-3.1.0-test-sources.jar
/home/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.1.0.jar

Clearly I am missing something, so could somebody please point me the right direction.

796

asked Jun 06 '18 11:06

Paul Sowerby

1 Answers

After much googling of the same question asked different ways, I found this https://mathsigit.github.io/blog_page/2017/11/16/hole-of-submitting-mr-of-hadoop300RC0/ (it's in Chinese). So I set the following properties in mapred-site.xml

<property>
 <name>yarn.app.mapreduce.am.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
 <name>mapreduce.map.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
 <name>mapreduce.reduce.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>

And everything works

124

answered Oct 27 '22 15:10

Paul Sowerby

Related questions
                            
                                How to configure Hive warehouse path?
                            
                                NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop
                            
                                What is the "t" permission on HDFS directories?
                            
                                Difference between combiner and in-mapper combiner in mapreduce?
                            
                                concatenate a string to a field in pig
                            
                                How to Append new data to already existing hive table
                            
                                How can we pass List<Text> as Mapper output?
                            
                                How to use rbhive gem and query hive
                            
                                Hive Managed Table vs External Table : LOCATION directory
                            
                                Oozie: Does oozie generate output-events?
                            
                                How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)?
                            
                                Spark not leveraging hdfs partitioning with parquet
                            
                                hbase how to choose pre split strategies and how its affect your rowkeys
                            
                                Querying Hbase efficiently
                            
                                Write Parquet format to HDFS using Java API with out using Avro and MR
                            
                                HBase: How to specify multiple prefix filters in a single scan operation
                            
                                how does YARN "Fair Scheduler" work with spark-submit configuration parameter
                            
                                Where does the Hive data gets stored?
                            
                                Yarn get logs with rest API
                            
                                Use Data Lake or Blob on HDInsights cluster on Azure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unable to run mapreduce wordcount

Tags:

hadoop

mapreduce

Paul Sowerby

People also ask

1 Answers

Paul Sowerby

Recent Activity

Donate For Us