Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: save pandas data frame to parquet file

Is it possible to save a pandas data frame directly to a parquet file? If not, what would be the suggested process?

The aim is to be able to send the parquet file to another team, which they can use scala code to read/open it. Thanks!

like image 661
Edamame Avatar asked Dec 09 '16 18:12

Edamame


People also ask

Does Pandas support Parquet?

Pandas provides a beautiful Parquet interface. Pandas leverages the PyArrow library to write Parquet files, but you can also write Parquet files directly from PyArrow.


1 Answers

Pandas has a core function to_parquet(). Just write the dataframe to parquet format like this:

df.to_parquet('myfile.parquet') 

You still need to install a parquet library such as fastparquet. If you have more than one parquet library installed, you also need to specify which engine you want pandas to use, otherwise it will take the first one to be installed (as in the documentation). For example:

df.to_parquet('myfile.parquet', engine='fastparquet') 
like image 58
ben26941 Avatar answered Sep 21 '22 03:09

ben26941