Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HDFS access from remote host through Java API, user authentication

I need to use HDFS cluster from remote desktop through Java API. Everything works OK until it comes to write access. If I'm trying to create any file I receive access permission exception. Path looks good but exception indicates my remote desktop user name which is of course is not what I need to access needed HDFS directory.

The question is: - Is there any way to represent different user name using 'simple' authentication in Java API? - Could you please point some good explanation of authentication / authorization schemes in hadoop / HDFS preferable with Java API examples?

Yes, I already know 'whoami' could be overloaded in this case using shell alias but I prefer to avoid solutions like this. Also specifics here is I dislike usage of some tricks like pipes through SSH and scripts. I'd like to perform everything using just Java API. Thank you in advance.

like image 754
Roman Nikitchenko Avatar asked Apr 11 '13 05:04

Roman Nikitchenko


1 Answers

After some studying I came to the following solution:

  • I don't actually need the full Kerberos solution, it is enough currently that clients can run HDFS requests from any user. Environment itself is considered secure.
  • This gives me solution based on hadoop UserGroupInformation class. In future I can extend it to support Kerberos.

Sample code probably useful for people both for 'fake authentication' and remote HDFS access:

package org.myorg;

import java.security.PrivilegedExceptionAction;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;

public class HdfsTest {

    public static void main(String args[]) {

        try {
            UserGroupInformation ugi
                = UserGroupInformation.createRemoteUser("hbase");

            ugi.doAs(new PrivilegedExceptionAction<Void>() {

                public Void run() throws Exception {

                    Configuration conf = new Configuration();
                    conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");
                    conf.set("hadoop.job.ugi", "hbase");

                    FileSystem fs = FileSystem.get(conf);

                    fs.createNewFile(new Path("/user/hbase/test"));

                    FileStatus[] status = fs.listStatus(new Path("/user/hbase"));
                    for(int i=0;i<status.length;i++){
                        System.out.println(status[i].getPath());
                    }
                    return null;
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Useful reference for those who have a similar problem:

  • Cloudera blog post "Authorization and Authentication In Hadoop". Short, focused on simple explanation of hadoop security approaches. No information specific to Java API solution but good for basic understanding of the problem.

UPDATE:
Alternative for those who uses command line hdfs or hadoop utility without local user needed:

 HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /

What you actually do is you read local file in accordance to your local permissions but when placing file on HDFS you are authenticated like user hdfs.

This has pretty similar properties to API code illustrated:

  1. You don't need sudo.
  2. You don't need actually appropriate local user 'hdfs'.
  3. You don't need to copy anything or change permissions because of previous points.
like image 107
Roman Nikitchenko Avatar answered Nov 04 '22 04:11

Roman Nikitchenko