Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Parquet to CSV

Tags:

How to convert Parquet to CSV from a local file system (e.g. python, some library etc.) but WITHOUT Spark? (trying to find as simple and minimalistic solution as possible because need to automate everything and not much resources).

I tried with e.g. parquet-tools on my Mac but data output did not look correct.

Need to make output so that when data is not present in some columns - CSV will have corresponding NULL (empty column between 2 commas)..

Thanks.

like image 288
Joe Avatar asked Jul 06 '18 17:07

Joe


People also ask

How do I open a parquet file in Excel?

The Excel Add-In for Parquet provides the easiest way to connect with Apache Parquet data. Users simply supply their credentials via the connection wizard to create a connection and can immediately begin working with live Apache Parquet tables of data.

Can we read parquet file in pandas?

PyArrow includes Python bindings to this code, which thus enables reading and writing Parquet files with pandas as well.


1 Answers

You can do this by using the Python packages pandas and pyarrow (pyarrow is an optional dependency of pandas that you need for this feature).

import pandas as pd df = pd.read_parquet('filename.parquet') df.to_csv('filename.csv') 

When you need to make modifications to the contents in the file, you can standard pandas operations on df.

like image 59
Uwe L. Korn Avatar answered Oct 28 '22 01:10

Uwe L. Korn