Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connect to Impala using impyla client with Kerberos auth

I'm on a W8 machine, where I use Python (Anaconda distribution) to connect to Impala in our Hadoop cluster using the Impyla package. Our hadoop cluster is secured via Kerberos. I have followed the API REFERENCE how to configure the connection.

    from impala.dbapi import connect
    conn = connect( host='localhost', port=21050, auth_mechanism='GSSAPI',
               kerberos_service_name='impala')

We are using Kerberos GSSAPI with SASL

auth_mechanism='GSSAPI'

I have managed to install python-sasl library for WIN8 but still I encounter this error.

Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found (code THRIFTTRANSPORT): TTransportException('Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found',)

I wonder if I am still missing some dependencies.

like image 333
richban Avatar asked Jan 24 '16 15:01

richban


2 Answers

I ran into the same issue but i fixed it by installing the right version of required libraries.

Install below python libraries using pip:

six==1.12.0
bit_array==0.1.0
thrift==0.9.3
thrift_sasl==0.2.1
sasl==0.2.1
impyla==0.13.8

Below code is working fine with the python version 2.7 and 3.4.

import ssl
from impala.dbapi import connect
import os
os.system("kinit")
conn = connect(host='hostname.io', port=21050, use_ssl=True, database='default', user='urusername', kerberos_service_name='impala', auth_mechanism = 'GSSAPI')
cur = conn.cursor()
cur.execute('SHOW DATABASES;')
result=cur.fetchall()
for data in result:
    print (data) 
like image 140
Sumit Kumar Avatar answered Oct 30 '22 14:10

Sumit Kumar


Try this to get tables for kerberized cluster. In my case CDH-5.14.2-1.

Make sure you have a valid ticket before running this code.

with python 2.7 having below packages.

thrift-0.9.3
thriftpy-0.3.8
thrift_sasl-0.3.0
impyla==0.14.2.2

Working Code

from impala.dbapi import connect
from impala.util import as_pandas

# 21000 is impala daemon port.
conn = connect(host='yourHost', port=21050, auth_mechanism='GSSAPI') 

cursor = conn.cursor()
cursor.execute("SHOW TABLES")
# After running .execute(), Impala will store the result sets on the server
# until it is fetched. Use the method .fetchall() to pull the entire result
# set over the network (you should only do it if you know dataset is small)
tables = cursor.fetchall()

print("Displaying list of tables")
# the result is a list of tuples
for t in tables:
    # we know that each row in SHOW TABLES result
    # should only contains one table name
    print(t[0])
    # exit() enable for only one table

print("eol >>>")
like image 38
s_mj Avatar answered Oct 30 '22 14:10

s_mj