Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connect to HDFS with Kerberos Authentication using Python

I am trying to connect to HDFS protected with Kerberos authentication. I have following details but dont know how to proceed.

User
Password
Realm
HttpFs Url

I tried below code but getting Authentication error:

from hdfs.ext.kerberos import KerberosClient
import requests
import logging

logging.basicConfig(level=logging.DEBUG)

session = requests.Session()
session.verify = False

client = KerberosClient(url='http://x.x.x.x:abcd', session=session, 
mutual_auth='REQUIRED',principal='abcdef@LMNOPQ')

print(client.list('/'))

Error

INFO:hdfs.client:Instantiated   
<KerberosClient(url=http://x.x.x.x:abcd)>.
INFO:hdfs.client:Listing '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 
DEBUG:urllib3.connectionpool:http://x.x.x.x:abcd "GET /webhdfs/v1/? 
op=LISTSTATUS HTTP/1.1" 401 997
DEBUG:requests_kerberos.kerberos_:handle_401(): Handling: 401
ERROR:requests_kerberos.kerberos_:generate_request_header(): authGSSClientInit() failed:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
ERROR:requests_kerberos.kerberos_:((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
DEBUG:requests_kerberos.kerberos_:handle_401(): returning <Response [401]>
DEBUG:requests_kerberos.kerberos_:handle_response(): returning <Response [401]>

I have password also, but dont know where to provide it.

like image 800
ankit Avatar asked Jul 15 '19 04:07

ankit


People also ask

How do I use Kerberos authentication in Python?

The basic flow of a typical Kerberos authentication is as follows: Client sends an unauthenticated request to the server. Server sends back a 401 response with a WWW-Authenticate: Negotiate header with no authentication details. Client sends a new request with an Authorization: Negotiate header.

How Kerberos can be used in Hadoop security?

Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user and services. Kerberos is a third party authentication mechanism, in which users and services rely on a third party - the Kerberos server - to authenticate each to the other.

What is Kerberos authentication in hive?

Kerberos Authentication is the security mechanism that is commonly used for controlling access to the HDFS and HIVE. This knowledge base post is intended to provide the details on configuration steps needed for creating a connection in ETL Validator to Hive using Kerberos as the authentication mechanism.


1 Answers

Let's say you have priniciple : hdfs/[email protected] and your keytab file is : /var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab and if you wish to read a hdfs csv file already available at : /hadoop_test_data/filecount.csv, then use the following code and you will get the pandas dataframe with the contents of filecount.csv

More over here, I have used python version : 3.7.6

import io 
from csv import reader
from krbcontext import krbcontext
import subprocess 
import pandas as pd

try:
    with krbcontext(using_keytab=True,
                    principal='hdfs/[email protected]',
                    keytab_file='/var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab') as krb:
                    print(krb)
                    print('kerberos authentication successful') 
                    output = subprocess.Popen(["hadoop", "fs", "-cat", "/hadoop_test_data/filecount.csv"], stdout=subprocess.PIPE)
                    stdout,stderr = output.communicate()
                    data = str(stdout,'utf-8').split('\r\n')
                    df = pd.DataFrame( list(reader(data[1:])),columns=data[0].split(','))
                    print(df.shape)
                    print(df)

except Exception as e:
    print("Kerberos authentication unsuccessful")
    print("Detailed error is : "+e)

Let me know if you wish to know more about it.

like image 186
Guru Avatar answered Oct 08 '22 06:10

Guru