Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hdfs.connect() vs HdfsClient in PyArrow

I apologize if this is a noob question, but I couldn't find any relevant reference -

what is the difference between these two?

If I'd like to read parquet files from hdfs using pyarrow, which one would I use?

like image 267
Jay Avatar asked Nov 20 '17 21:11

Jay


1 Answers

The HdfsClient API was deprecated, you want to use pyarrow.hdfs.connect now to connect: http://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs

like image 124
Wes McKinney Avatar answered Oct 19 '22 15:10

Wes McKinney