Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I open a .snappy.parquet file in python?

How can I open a .snappy.parquet file in python 3.5? So far, I used this code:

import numpy
import pyarrow

filename = "/Users/T/Desktop/data.snappy.parquet" 
df = pyarrow.parquet.read_table(filename).to_pandas()

But, it gives this error:

AttributeError: module 'pyarrow' has no attribute 'compat'

P.S. I installed pyarrow this way:

pip install pyarrow
like image 995
user9439906 Avatar asked Oct 05 '18 01:10

user9439906


1 Answers

I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.

1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)

2) Add the snappy_decompress function.

from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
    return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()
like image 144
Bengi Koseoglu Avatar answered Sep 22 '22 09:09

Bengi Koseoglu