Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access Hive Data Using Python

Tags:

python

hive

I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?

like image 763
subina mohanan Avatar asked Jul 18 '13 11:07

subina mohanan


2 Answers

To install you'll need these libraries:

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install)

After installation, you can execute a hive query like this:

from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
  use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
like image 77
Tristan Reid Avatar answered Nov 06 '22 05:11

Tristan Reid


You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive

Below the Example

import sys

from hive import ThriftHive
from hive.ttypes import HiveServerException

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

try:
  transport = TSocket.TSocket('localhost', 10000)
  transport = TTransport.TBufferedTransport(transport)
  protocol = TBinaryProtocol.TBinaryProtocol(transport)
  client = ThriftHive.Client(protocol)
  transport.open()
  client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
  client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
  client.execute("SELECT * FROM r")
  while (1):
    row = client.fetchOne()
    if (row == None):
       break
    print row

  client.execute("SELECT * FROM r")
  print client.fetchAll()
  transport.close()
except Thrift.TException, tx:
  print '%s' % (tx.message)
like image 27
Sreejith Avatar answered Nov 06 '22 05:11

Sreejith