Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using csvreader against a gzipped file in Python

Tags:

python

csv

gzip

I have a bunch of gzipped CSV files that I'd like to open for inspection using Python's built in CSV reader. I'd like to do this without having first to manually unzip them to disk. I guess I want to somehow get a stream to the uncompressed data, and pass this into the CSV reader. Is this possible in Python?

like image 492
Mike Chamberlain Avatar asked Feb 12 '12 21:02

Mike Chamberlain


People also ask

What is the best way to read a csv file in Python?

Read A CSV File Using Python There are two common ways to read a . csv file when using Python. The first by using the csv library, and the second by using the pandas library.

How do I read a compressed CSV file in Python?

parse_dump_file(filename) File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 30, in parse_dump_file for line in csvobj: _csv. Error: iterator should return strings, not bytes (did you open the file in text mode?) gunzipping the file and opening that with csv works fine.

Do you need to close CSVReader?

The reader itself doesn't manage any resources that would need to be cleaned up when you're done using it, so there's no need to close it; it'd be a meaningless operation. If you're not familiar with the with statement, it's roughly equivalent to enclosing its contents in a try...


2 Answers

Use the gzip module:

with gzip.open(filename, mode='rt') as f:     reader = csv.reader(f)     #... 
like image 178
tzaman Avatar answered Sep 28 '22 00:09

tzaman


I've tried the above version for writing and reading and it didn't work in Python 3.3 due to "bytes" error. However, after some trial and error I could get the following to work. Maybe it also helps others:

import csv import gzip import io   with gzip.open("test.gz", "w") as file:     writer = csv.writer(io.TextIOWrapper(file, newline="", write_through=True))     writer.writerow([1, 2, 3])     writer.writerow([4, 5, 6])  with gzip.open("test.gz", "r") as file:     reader = csv.reader(io.TextIOWrapper(file, newline=""))     print(list(reader)) 

As amohr suggests, the following works as well:

import gzip, csv  with gzip.open("test.gz", "wt", newline="") as file:     writer = csv.writer(file)     writer.writerow([1, 2, 3])     writer.writerow([4, 5, 6])  with gzip.open("test.gz", "rt", newline="") as file:     reader = csv.reader(file)     print(list(reader)) 
like image 22
Gerenuk Avatar answered Sep 28 '22 01:09

Gerenuk