I am trying to connect to HDFS protected with Kerberos authentication. I have following details but dont know how to proceed.
User
Password
Realm
HttpFs Url
I tried below code but getting Authentication error:
from hdfs.ext.kerberos import KerberosClient
import requests
import logging
logging.basicConfig(level=logging.DEBUG)
session = requests.Session()
session.verify = False
client = KerberosClient(url='http://x.x.x.x:abcd', session=session,
mutual_auth='REQUIRED',principal='abcdef@LMNOPQ')
print(client.list('/'))
Error
INFO:hdfs.client:Instantiated
<KerberosClient(url=http://x.x.x.x:abcd)>.
INFO:hdfs.client:Listing '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
DEBUG:urllib3.connectionpool:http://x.x.x.x:abcd "GET /webhdfs/v1/?
op=LISTSTATUS HTTP/1.1" 401 997
DEBUG:requests_kerberos.kerberos_:handle_401(): Handling: 401
ERROR:requests_kerberos.kerberos_:generate_request_header(): authGSSClientInit() failed:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
ERROR:requests_kerberos.kerberos_:((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
DEBUG:requests_kerberos.kerberos_:handle_401(): returning <Response [401]>
DEBUG:requests_kerberos.kerberos_:handle_response(): returning <Response [401]>
I have password also, but dont know where to provide it.
The basic flow of a typical Kerberos authentication is as follows: Client sends an unauthenticated request to the server. Server sends back a 401 response with a WWW-Authenticate: Negotiate header with no authentication details. Client sends a new request with an Authorization: Negotiate header.
Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user and services. Kerberos is a third party authentication mechanism, in which users and services rely on a third party - the Kerberos server - to authenticate each to the other.
Kerberos Authentication is the security mechanism that is commonly used for controlling access to the HDFS and HIVE. This knowledge base post is intended to provide the details on configuration steps needed for creating a connection in ETL Validator to Hive using Kerberos as the authentication mechanism.
Let's say you have priniciple : hdfs/[email protected] and your keytab file is : /var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab and if you wish to read a hdfs csv file already available at : /hadoop_test_data/filecount.csv, then use the following code and you will get the pandas dataframe with the contents of filecount.csv
More over here, I have used python version : 3.7.6
import io
from csv import reader
from krbcontext import krbcontext
import subprocess
import pandas as pd
try:
with krbcontext(using_keytab=True,
principal='hdfs/[email protected]',
keytab_file='/var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab') as krb:
print(krb)
print('kerberos authentication successful')
output = subprocess.Popen(["hadoop", "fs", "-cat", "/hadoop_test_data/filecount.csv"], stdout=subprocess.PIPE)
stdout,stderr = output.communicate()
data = str(stdout,'utf-8').split('\r\n')
df = pd.DataFrame( list(reader(data[1:])),columns=data[0].split(','))
print(df.shape)
print(df)
except Exception as e:
print("Kerberos authentication unsuccessful")
print("Detailed error is : "+e)
Let me know if you wish to know more about it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With