Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify Pandas' backend for Parquet

I understand that Pandas can read and write to and from Parquet files using different backends: pyarrow and fastparquet.

I have a Conda distribution with the Intel distribution and "it works": I can use pandas.DataFrame.to_parquet. However I do not have pyarrow installed so I guess that fastparquet is used (which I cannot find either).

Is there a way to identify which backend is used?

like image 529
Cedric H. Avatar asked Jun 08 '18 12:06

Cedric H.


People also ask

Does pandas support parquet?

Pandas provides a beautiful Parquet interface. Pandas leverages the PyArrow library to write Parquet files, but you can also write Parquet files directly from PyArrow.

How do I write a pandas DataFrame to a parquet file?

DataFrame - to_parquet() function The to_parquet() function is used to write a DataFrame to the binary parquet format. This function writes the dataframe as a parquet file. File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.

Which is better PyArrow or Fastparquet?

According to it, pyarrow is faster than fastparquet, little wonder it is the default engine used in dask.


1 Answers

Just execute these 2 commands in linux shell/bash

pip install pyarrow

pip install fastparquet
like image 119
ANKIT CHOPADE Avatar answered Oct 05 '22 19:10

ANKIT CHOPADE