Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyhs2/hive No files matching path file and file Exists

Tags:

hive

hdfs

Using the hive or beeline client, I have no problem executing this statement:

hive -e "LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2"

The data from the file is loaded successfully into hive.

However, when using pyhs2 from the same machine, the file is not found:

import pyhs2
conn_str = {'authMechanism':'NOSASL', 'host':'azus',}
conn = pyhs2.connect(conn_str)
with conn.cursor() as cur:
    cur.execute("LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2")

Throws exception:

Traceback (most recent call last):
  File "data_access/hs2.py", line 38, in write
    cur.execute("LOAD DATA LOCAL INPATH '%s' INTO TABLE %s" % (csv_file.name, table_name))
  File "/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.py", line 63, in execute
    raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage)
pyhs2.error.Pyhs2Exception: "Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/tmpBKe_Mc'': No files matching path file:/tmp/tmpBKe_Mc"

I've seen similar questions posted about this problem, and the usual answer is that the query is running on a different server that doesn't have the local file '/tmp/tmpBKe_Mc' stored on it. However, if that is the case, why would running the command directly from the CLI work but using pyhs2 not work?

(Secondary question: how can I show which server is trying to handle the query? I've tried cur.execute("set"), which returns all configuration parameters but when grepping for "host" the returned parameters don't seem to contain a real hostname.)

Thanks!

like image 624
John Prior Avatar asked Nov 10 '22 22:11

John Prior


1 Answers

This happens because pyhs2 trying to find file on cluster

Solution is to have your source saved in related hdfs location instead of /tmp

like image 102
itsavy Avatar answered Jan 04 '23 01:01

itsavy