I am trying to write a pandas dataframe
to parquet
file format (introduced in most recent pandas version 0.21.0) in append
mode. However, instead of appending to the existing file, the file is overwritten with new data. What am i missing?
the write syntax is
df.to_parquet(path, mode='append')
the read syntax is
pd.read_parquet(path)
Pandas DataFrame: to_parquet() function The to_parquet() function is used to write a DataFrame to the binary parquet format. This function writes the dataframe as a parquet file. File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.
Parquet slices columns into chunks and allows parts of a column to be stored in several chunks within a single file, thus append is possible.
Append or Overwrite an existing Parquet fileUsing append save mode, you can append a dataframe to an existing parquet file. Incase to overwrite use overwrite save mode.
To append, do this:
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
dataframe = pd.read_csv('content.csv')
output = "/Users/myTable.parquet"
# Create a parquet table from your dataframe
table = pa.Table.from_pandas(dataframe)
# Write direct to your parquet file
pq.write_to_dataset(table , root_path=output)
This will automatically append into your table.
There is no append mode in pandas.to_parquet()
. What you can do instead is read the existing file, change it, and write back to it overwriting it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With