Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't understand Python's csv.reader object

I've come across a behavior in python's built-in csv module that I've never noticed before. Typically, when I read in a csv, it's following the doc's pretty much verbatim, using 'with' to open the file then looping over the reader object with a 'for' loop. However, I recently tried iterating over the csv.reader object twice in a row, only to find out that the second 'for' loop did nothing.

import csv

with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')

for line in readit:
    print line

for line in readit:
    print 'foo'

Console Output:

Austins-iMac:Desktop austin$ python -i amy.py 
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>> 

So the second 'for' loop basically does nothing. One thought I had is the csv.reader object is being released from memory after being read once. This isn't the case though since it still retains it's memory address. I found a post that mentions a similar problem. The reason they gave is that once the object is read, the pointer stay's at the end of the memory address ready to write data to the object. Is this correct? Could someone go into greater detail as to what is going on here? Is there a way to push the pointer back to the beginning of the memory address to reread it? I know it's bad coding practices to do that but I'm mainly just curious and wanting to learn more about what goes on under Python's hood.

Thanks!

like image 383
Austin A Avatar asked Dec 03 '14 06:12

Austin A


People also ask

How do I read the contents of a CSV file in Python?

csv file in reading mode using open() function. Then, the csv. reader() is used to read the file, which returns an iterable reader object. The reader object is then iterated using a for loop to print the contents of each row.

How do I create a reader object for a CSV file in Python?

Below are steps to read CSV file in Python. Step 1) To read data from CSV files, you must use the reader function to generate a reader object. The reader function is developed to take each row of the file and make a list of all columns. Then, you have to choose the column you want the variable data for.

What does csv Writer object do?

The csv module's reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.


2 Answers

I'll try to answer your other questions about what the reader is doing and why reset() or seek(0) might help. In the most basic form, the csv reader might look something like this:

def csv_reader(it):
    for line in it:
        yield line.strip().split(',')

That is, it takes any iterator producing strings and gives you a generator. All it does is take an item from your iterator, process it and return the item. When it is consumed, the csv_reader will quit. The reader has no idea where the iterator came from or how to properly make a fresh one, so it doesn't even try to reset itself. That is left to the programmer.

We can either modify the iterator in place without the reader knowing or just make a new reader. Here are some examples to demonstrate my point.

data = open('data.csv', 'r')
reader = csv.reader(data)

print(next(reader))               # Parse the first line
[next(data) for _ in range(5)]    # Skip the next 5 lines on the underlying iterator
print(next(reader))               # This will be the 7'th line in data
print(reader.line_num)            # reader thinks this is the 2nd line
data.seek(0)                      # Go back to the beginning of the file
print(next(reader))               # gives first line again

data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data)         # works fine on lists of strings too
print(next(reader))               # ['1', '2', '3']

In general if you need a 2nd pass, its best to close/reopen your files and use a new csv reader. Its clean and ensures nice bookkeeping.

like image 53
kalhartt Avatar answered Sep 24 '22 08:09

kalhartt


Iterating over a csvreader simply wraps iterating over the lines in the underlying file object. On each iteration the reader gets the next line from the file, converts and returns it.

So iterating over a csvreader follows the same conventions as iterating over files. That is, once the file reached its end you'd have to seek to the start before iterating a second time.

The below should do, though I haven't tested it:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')

    for line in readit:
        print line

    # go back to the start of the file
    csvfile.seek(0)

    for line in readit:
        print 'foo
like image 37
sebastian Avatar answered Sep 26 '22 08:09

sebastian