I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?
Yes you can. If you want to read a zipped or a tar. gz file into pandas dataframe, the read_csv methods includes this particular implementation. For on-the-fly decompression of on-disk data.
Read a File from Multiple Files in Zip Folder csv file. Pandas cannot directly read data from a zip folder if there are multiple files; to solve this, we will use the zipfile module within Python. The zipfile module offers two routes for reading in zip data : ZipFile and Path classes.
Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.
I used the zipfile
module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":
import pandas as pd import zipfile zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') df = pd.read_csv(zf.open('intfile.csv'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With