Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to load libhdfs when using pyarrow

I'm trying to connect to HDFS through Pyarrow, but it does not work because libhdfs library cannot be loaded.

libhdfs.so is in $HADOOP_HOME/lib/native as well as in $ARROW_LIBHDFS_DIR.

print(os.environ['ARROW_LIBHDFS_DIR'])
fs = hdfs.connect()


bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples        libhadoop.so.1.0.0  libhdfs.a       libnativetask.a
libhadoop.a     libhadooppipes.a    libhdfs.so      libnativetask.so
libhadoop.so        libhadooputils.a    libhdfs.so.0.0.0    libnativetask.so.1.0.0

The error I'm getting:

Traceback (most recent call last):
  File "wine-pred-ml.py", line 31, in <module>
    fs = hdfs.connect()
  File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 183, in connect
    extra_conf=extra_conf)
  File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libhdfs
like image 282
Pablo Velasquez Avatar asked Oct 31 '18 16:10

Pablo Velasquez


1 Answers

This solves my issue:

conda install libhdfs3 pyarrow

in your script.py:

import os
os.environ['ARROW_LIBHDFS_DIR'] = '/opt/cloudera/parcels/CDH/lib64/'

where the path is the directory in which libhdfs3 lives - in my case this is where Cloudera hosts the lib

like image 195
b0lle Avatar answered Nov 07 '22 15:11

b0lle