Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert CSV to parquet file without RLE_DICTIONARY encoding?

I've already test three ways of converting a csv file to a parquet file. You can find them below. All the three created the parquet file. I've tried to view the contents of the parquet file using "APACHE PARQUET VIEWER" on Windows and I always got the following error message:

"encoding RLE_DICTIONARY is not supported"

Is there any way to avoid this? Maybe a way to use another type of encoding?... Below the code:

1º Using pandas:

import pandas as pd
df = pd.read_csv("filename.csv")
df.to_parquet("filename.parquet")

2º Using pyarrow:

from pyarrow import csv, parquet
table = csv.read_csv("filename.csv")
parquet.write_table(table, "filename.parquet")

3º Using dask:

from dask.dataframe import read_csv
dask_df = read_csv("filename.csv", dtype={'column_xpto': 'float64'})
dask_df.to_parquet("filename.parquet")
like image 358
rcmv Avatar asked Oct 18 '25 15:10

rcmv


1 Answers

You should set use_dictionary to False:

import pandas as pd
df = pd.read_csv("filename.csv")
df.to_parquet("filename.parquet", use_dictionary=False)
like image 199
Alex Erygin Avatar answered Oct 21 '25 04:10

Alex Erygin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!