Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas write dataframe to parquet format with append

I am trying to write a pandas dataframe to parquet file format (introduced in most recent pandas version 0.21.0) in append mode. However, instead of appending to the existing file, the file is overwritten with new data. What am i missing?

the write syntax is

df.to_parquet(path, mode='append')

the read syntax is

pd.read_parquet(path)
like image 604
Siraj S. Avatar asked Nov 08 '17 23:11

Siraj S.


People also ask

How do I write pandas DataFrame to Parquet?

Pandas DataFrame: to_parquet() function The to_parquet() function is used to write a DataFrame to the binary parquet format. This function writes the dataframe as a parquet file. File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.

Can you append to Parquet?

Parquet slices columns into chunks and allows parts of a column to be stored in several chunks within a single file, thus append is possible.

Can you append to a parquet file Python?

Append or Overwrite an existing Parquet fileUsing append save mode, you can append a dataframe to an existing parquet file. Incase to overwrite use overwrite save mode.


2 Answers

To append, do this:

import pandas as pd 
import pyarrow.parquet as pq
import pyarrow as pa

dataframe = pd.read_csv('content.csv')
output = "/Users/myTable.parquet"

# Create a parquet table from your dataframe
table = pa.Table.from_pandas(dataframe)

# Write direct to your parquet file
pq.write_to_dataset(table , root_path=output)

This will automatically append into your table.

like image 146
Victor Faro Avatar answered Sep 23 '22 16:09

Victor Faro


There is no append mode in pandas.to_parquet(). What you can do instead is read the existing file, change it, and write back to it overwriting it.

like image 24
ben26941 Avatar answered Sep 25 '22 16:09

ben26941