Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop jobs fail when submitted by users other than yarn (MRv2) or mapred (MRv1)

Tags:

hadoop

hadoop2

I am running a test cluster running MRv1 (CDH5) paired with LocalFileSystem, and the only user I am able to run jobs as is mapred (as mapred is the user starting the jobtracker/tasktracker daemons). When submitting jobs as any other user, the jobs fail because the jobtracker/tasktracker is unable to find the job.jar under the .staging directory.

I have the exact same issue with YARN (MRv2) when paired with LocalFileSystem, i.e. when submitting jobs by a user other than 'yarn', the application master is unable to locate the job.jar under the .staging directory.

Upon inspecting the .staging directory of the user submitting the job I found that job.jar exists under the .staging// directory, but the permissions on the and .staging directories are set to 700 (drwx------) and hence the application master / tasktracker is not able to access the job.jar and supporting files.

We are running the test cluster with LocalFileSystem since we use only MapReduce part of the Hadoop project paired with OCFS in our production setup.

Any assistance in this regard would be immensely helpful.

like image 997
Hadoop User Avatar asked Nov 01 '22 21:11

Hadoop User


2 Answers

You need to be setting up a staging directory for each user in the cluster. This is not as complicated as it sounds.

Check the following properties:

<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<source>core-default.xml</source>
</property>

This basically setups a tmp directory for each user.

Tie this to your staging directory :

<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>${hadoop.tmp.dir}/mapred/staging</value>
<source>mapred-default.xml</source>
</property>

Let me know if this works or if it already setup this way.

These properties should be in yarn-site.xml - if i remember correctly.

like image 194
Venkat Avatar answered Nov 15 '22 07:11

Venkat


This worked for me, I just set this property in MR v1:

<property>
    <name>hadoop.security.authorization</name>
    <value>simple</value>
  </property>

Please go through this:

Access Control Lists ${HADOOP_CONF_DIR}/hadoop-policy.xml defines an access control list for each Hadoop service. Every access control list has a simple format:

The list of users and groups are both comma separated list of names. The two lists are separated by a space.

Example: user1,user2 group1,group2.

Add a blank at the beginning of the line if only a list of groups is to be provided, equivalently a comman-separated list of users followed by a space or nothing implies only a set of given users.

A special value of * implies that all users are allowed to access the service.

Refreshing Service Level Authorization Configuration The service-level authorization configuration for the NameNode and JobTracker can be changed without restarting either of the Hadoop master daemons. The cluster administrator can change ${HADOOP_CONF_DIR}/hadoop-policy.xml on the master nodes and instruct the NameNode and JobTracker to reload their respective configurations via the -refreshServiceAcl switch to dfsadmin and mradmin commands respectively.

Refresh the service-level authorization configuration for the NameNode:

$ bin/hadoop dfsadmin -refreshServiceAcl

Refresh the service-level authorization configuration for the JobTracker:

$ bin/hadoop mradmin -refreshServiceAcl

Of course, one can use the security.refresh.policy.protocol.acl property in ${HADOOP_CONF_DIR}/hadoop-policy.xml to restrict access to the ability to refresh the service-level authorization configuration to certain users/groups.

Examples Allow only users alice, bob and users in the mapreduce group to submit jobs to the MapReduce cluster:

<property>
     <name>security.job.submission.protocol.acl</name>
     <value>alice,bob mapreduce</value>
</property>

Allow only DataNodes running as the users who belong to the group datanodes to communicate with the NameNode:

<property>
     <name>security.datanode.protocol.acl</name>
     <value>datanodes</value>
</property>
Allow any user to talk to the HDFS cluster as a DFSClient:

<property>
     <name>security.client.protocol.acl</name>
     <value>*</value>
</property>
like image 30
Mayank Agarwal Avatar answered Nov 15 '22 07:11

Mayank Agarwal