Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load directly gz file into pandas dataframe

I have this gz file from dati.istat.it: within it's a csv file (with different name) that i want load directly in pandas dataframe.

If i unzip with 7zip i easily load with this code pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")

how i can do it without unzip with 7zip frist?

thx a lot!

like image 492
Marco Scarselli Avatar asked Jan 30 '16 11:01

Marco Scarselli


1 Answers

You can use library zipfile:

import pandas as pd
import zipfile

z = zipfile.ZipFile('test/file.gz')
print pd.read_csv(z.open("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv"),
                  sep="|",
                  engine = "python")

Pandas supports only gzip and bz2 in read_csv:

compression : {‘gzip’, ‘bz2’, ‘infer’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip or bz2 if filepath_or_buffer is a string ending in ‘.gz’ or ‘.bz2’, respectively, and no decompression otherwise. Set to None for no decompression.

like image 159
jezrael Avatar answered Oct 02 '22 20:10

jezrael