Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open a csv.gz file in Python and print first 100 rows

Tags:

python

csv

I'm trying to get only the first 100 rows of a csv.gz file that has over 4 million rows in Python. I also want information on the # of columns and the headers of each. How can I do this?

I looked at python: read lines from compressed text files to figure out how to open the file but I'm struggling to figure out how to actually print the first 100 rows and get some metadata on the information in the columns.

I found this Read first N lines of a file in python but not sure how to marry this to opening the csv.gz file and reading it without saving an uncompressed csv file.

I have written this code:

import gzip
import csv
import json
import pandas as pd


df = pd.read_csv('google-us-data.csv.gz', compression='gzip', header=0,    sep=' ', quotechar='"', error_bad_lines=False)
for i in range (100):
print df.next() 

I'm new to Python and I don't understand the results. I'm sure my code is wrong and I've been trying to debug it but I don't know which documentation to look at.

I get these results (and it keeps going down the console - this is an excerpt):

Skipping line 63: expected 3 fields, saw 7
Skipping line 64: expected 3 fields, saw 7
Skipping line 65: expected 3 fields, saw 7
Skipping line 66: expected 3 fields, saw 7
Skipping line 67: expected 3 fields, saw 7
Skipping line 68: expected 3 fields, saw 7
Skipping line 69: expected 3 fields, saw 7
Skipping line 70: expected 3 fields, saw 7
Skipping line 71: expected 3 fields, saw 7
Skipping line 72: expected 3 fields, saw 7
like image 362
SizzyNini Avatar asked Sep 22 '16 17:09

SizzyNini


People also ask

How do I extract a CSV file from GZ in Python?

csv ) and read it in your Python shell, use the gzip. open(filename, 'rt', newline='') function call to open the gzipped file, the file. read() function to read its contents, and the file. write() function to write the CSV in a normal (unzipped) file.

How do I open a CSV file GZ?

Launch WinZip from your start menu or Desktop shortcut. Open the compressed file by clicking File > Open. If your system has the compressed file extension associated with WinZip program, just double-click on the file.

How do I view a .GZ file in Python?

To open a compressed file in text mode, use open() (or wrap your GzipFile with an io. TextIOWrapper ).

How do you read all rows in CSV file in Python?

Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.


1 Answers

Pretty much what you've already done, except read_csv also has nrows where you can specify the number of rows you want from the data set.

Additionally, to prevent the errors you were getting, you can set error_bad_lines to False. You'll still get warnings (if that bothers you, set warn_bad_lines to False as well). These are there to indicate inconsistency in how your dataset is filled out.

import pandas as pd
data = pd.read_csv('google-us-data.csv.gz', nrows=100, compression='gzip',
                   error_bad_lines=False)
print(data)

You can easily do something similar with the csv built-in library, but it'll require a for loop to iterate over the data, has shown in other examples.

like image 53
HEADLESS_0NE Avatar answered Sep 27 '22 22:09

HEADLESS_0NE