What is the best way to create/write/update a file in remote HDFS from local python script?
I am able to list files and directories but writing seems to be a problem.
I have searched hdfs and snakebite but none of them give a clean way to do this.
The use case is simple. We need to write the contents of a Pandas DataFrame to Hadoop's distributed filesystem, known as HDFS. We can call this work an HDFS Writer Micro-service, for example. In our case we can make it a tiny bit more complex (and realistic) by adding a Kerberos security requirement.
You can try using the underlying java classes available through the SparkSession (tested in Spark 3.1, but should also work for Spark 2). The dataStream. write() method takes in bytes so can be used to write arbitrary binary data, or you can find methods that take other types here.
try HDFS liberary.. its really good You can use write(). https://hdfscli.readthedocs.io/en/latest/api.html#hdfs.client.Client.write
Example:
to create connection:
from hdfs import InsecureClient
client = InsecureClient('http://host:port', user='ann')
from json import dump, dumps
records = [
{'name': 'foo', 'weight': 1},
{'name': 'bar', 'weight': 2},
]
# As a context manager:
with client.write('data/records.jsonl', encoding='utf-8') as writer:
dump(records, writer)
# Or, passing in a generator directly:
client.write('data/records.jsonl', data=dumps(records), encoding='utf-8')
For CSV you can do
import pandas as pd
df=pd.read.csv("file.csv")
with client_hdfs.write('path/output.csv', encoding = 'utf-8') as writer:
df.to_csv(writer)
They use WebHDFS, which is not enabled by default, and insecure without Kerberos or Apache Knox.
This is what the upload
function of that hdfs
library you linked to uses.
You can use pyspark
.
Example - How to write pyspark dataframe to HDFS and then how to read it back into dataframe?
snakebite
has been mentioned, but it doesn't write files
pyarrow
has a FileSystem.open() function that should be able to write to HDFS as well, though I've not tried.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With