I'm trying to connect to HDFS through Pyarrow, but it does not work because libhdfs
library cannot be loaded.
libhdfs.so
is in $HADOOP_HOME/lib/native
as well as in $ARROW_LIBHDFS_DIR
.
print(os.environ['ARROW_LIBHDFS_DIR'])
fs = hdfs.connect()
bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples libhadoop.so.1.0.0 libhdfs.a libnativetask.a
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0
The error I'm getting:
Traceback (most recent call last):
File "wine-pred-ml.py", line 31, in <module>
fs = hdfs.connect()
File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 183, in connect
extra_conf=extra_conf)
File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 37, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libhdfs
This solves my issue:
conda install libhdfs3 pyarrow
in your script.py:
import os
os.environ['ARROW_LIBHDFS_DIR'] = '/opt/cloudera/parcels/CDH/lib64/'
where the path is the directory in which libhdfs3 lives - in my case this is where Cloudera hosts the lib
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With