How to dump a file to a Hadoop HDFS directory using Python pickle?

Question

I am on a VM in a directory that contains my Python (2.7) class. I am trying to pickle an instance of my class to a directory in my HDFS.

I'm trying to do something along the lines of:

import pickle

my_obj = MyClass() # the class instance that I want to pickle

with open('hdfs://domain.example.com/path/to/directory/') as hdfs_loc:
    pickle.dump(my_obj, hdfs_loc)

From what research I've done, I think something like snakebite might be able to help...but does anyone have more concrete suggestions?

Rene B. · Accepted Answer

If you use PySpark, then you can use the saveAsPickleFile method:

temp_rdd = sc.parallelize(my_obj)
temp_rdd.coalesce(1).saveAsPickleFile("/test/tmp/data/destination.pickle")

How to dump a file to a Hadoop HDFS directory using Python pickle?

Tags:

python

hadoop

hdfs

J. Appleseed

1 Answers

Rene B.

Recent Activity

Donate For Us

How to dump a file to a Hadoop HDFS directory using Python pickle?

Tags:

python

hadoop

hdfs

J. Appleseed

1 Answers

Rene B.

Related questions

Recent Activity

Donate For Us