I am using Python 3.6 interpreter in my PyCharm venv, and trying to convert a CSV to Parquet.
import pandas as pd    
df = pd.read_csv('/parquet/drivers.csv')
df.to_parquet('output.parquet')
Error-1 ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support
Solution-1 Installed fastparquet 0.2.1
Error-2 File "/Users/python parquet/venv/lib/python3.6/site-packages/fastparquet/compression.py", line 131, in compress_data (algorithm, sorted(compressions))) RuntimeError: Compression 'snappy' not available. Options: ['GZIP', 'UNCOMPRESSED']
I Installed python-snappy 0.5.3 but still getting the same error? Do I need to install any other library?
If I use PyArrow 0.12.0 engine, I don't experience the issue.
In fastparquet snappy compression is an optional feature.
To quickly check a conversion from csv to parquet, you can execute the following script (only requires pandas and fastparquet):
import pandas as pd
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
df.to_csv("/tmp/test_csv", index=False)
df_csv = pd.read_csv("/tmp/test_csv")
df_csv.head() # Test your intermediate value
df_csv.to_parquet("/tmp/test_parquet", compression="GZIP")
df_parquet = ParquetFile("/tmp/test_parquet").to_pandas()
df_parquet.head() # Test your final value
However, if you need to write or read using snappy compression you might follow this answer about installing snappy library on ubuntu.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With