Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Methods for writing Parquet files using Python?

I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction with it.

Thus far the only method I have found is using Spark with the pyspark.sql.DataFrame Parquet support.

I have some scripts that need to write Parquet files that are not Spark jobs. Is there any approach to writing Parquet files in Python that doesn't involve pyspark.sql?

like image 706
octagonC Avatar asked Oct 05 '15 02:10

octagonC


People also ask

How do I create a Parquet file with pandas?

Pandas DataFrame: to_parquet() functionThe to_parquet() function is used to write a DataFrame to the binary parquet format. This function writes the dataframe as a parquet file. File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.

Which of the below method can be used to save a DataFrame as a Parquet file?

Using append save mode, you can append a dataframe to an existing parquet file.


2 Answers

Update (March 2017): There are currently 2 libraries capable of writing Parquet files:

  1. fastparquet
  2. pyarrow

Both of them are still under heavy development it seems and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need.

OLD ANSWER:

As of 2.2016 there seems to be NO python-only library capable of writing Parquet files.

If you only need to read Parquet files there is python-parquet.

As a workaround you will have to rely on some other process like e.g. pyspark.sql (which uses Py4J and runs on the JVM and can thus not be used directly from your average CPython program).

like image 185
rkrzr Avatar answered Sep 21 '22 19:09

rkrzr


fastparquet does have write support, here is a snippet to write data to a file

from fastparquet import write
write('outfile.parq', df)
like image 7
Muayyad Alsadi Avatar answered Sep 22 '22 19:09

Muayyad Alsadi