Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save dataframe to pickle file using Pyspark

Tags:

pickle

pyspark

I have to save a dataframe to Pickle file, but it returns an error

df.saveAsPickleFile(path)

AttributeError: 'Dataframe' object has no attribute 'saveAsPickleFile'

like image 539
adil blanco Avatar asked Mar 29 '18 14:03

adil blanco


1 Answers

saveAsPickleFile is a method of RDD and not of a data frame.

see this documentation: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pickle

So you can just call:

df.rdd.saveAsPickleFile(filename)

To load it from file, run:

pickleRdd = sc.pickleFile(filename).collect()
df2 = spark.createDataFrame(pickleRdd)
like image 139
Omri374 Avatar answered Dec 01 '22 16:12

Omri374