Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing with Python's built-in .csv module

[Please note that this is a different question from the already answered How to replace a column using Python’s built-in .csv writer module?]

I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python.

I'm having trouble when I try to write back to a .csv file after making a change to the contents of an entry. I've read the official csv module documentation about how to use the writer, but there isn't an example that covers this case. Specifically, I am trying to get the read, replace, and write operations accomplished in one loop. However, one cannot use the same 'row' reference in both the for loop's argument and as the parameter for writer.writerow(). So, once I've made the change in the for loop, how should I write back to the file?

edit: I implemented the suggestions from S. Lott and Jimmy, still the same result

edit #2: I added the "rb" and "wb" to the open() functions, per S. Lott's suggestion

import csv

#filename = 'C:/Documents and Settings/username/My Documents/PALTemplateData.xls'

csvfile = open("PALTemplateData.csv","rb")
csvout = open("PALTemplateDataOUT.csv","wb")
reader = csv.reader(csvfile)
writer = csv.writer(csvout)

changed = 0;

for row in reader:
    row[-1] = row[-1].replace('/?', '?')
    writer.writerow(row)                  #this is the line that's causing issues
    changed=changed+1

print('Total URLs changed:', changed)

edit: For your reference, this is the new full traceback from the interpreter:

Traceback (most recent call last):
  File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 13, in <module>
    for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
like image 280
ignorantslut Avatar asked Jun 19 '09 21:06

ignorantslut


2 Answers

You cannot read and write the same file.

source = open("PALTemplateData.csv","rb")
reader = csv.reader(source , dialect)

target = open("AnotherFile.csv","wb")
writer = csv.writer(target , dialect)

The normal approach to ALL file manipulation is to create a modified COPY of the original file. Don't try to update files in place. It's just a bad plan.


Edit

In the lines

source = open("PALTemplateData.csv","rb")

target = open("AnotherFile.csv","wb")

The "rb" and "wb" are absolutely required. Every time you ignore those, you open the file for reading in the wrong format.

You must use "rb" to read a .CSV file. There is no choice with Python 2.x. With Python 3.x, you can omit this, but use "r" explicitly to make it clear.

You must use "wb" to write a .CSV file. There is no choice with Python 2.x. With Python 3.x, you must use "w".


Edit

It appears you are using Python3. You'll need to drop the "b" from "rb" and "wb".

Read this: http://docs.python.org/3.0/library/functions.html#open

like image 86
S.Lott Avatar answered Oct 13 '22 00:10

S.Lott


Opening csv files as binary is just wrong. CSV are normal text files so You need to open them with

source = open("PALTemplateData.csv","r")
target = open("AnotherFile.csv","w")

The error

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

comes because You are opening them in binary mode.

When I was opening excel csv's with python, I used something like:

try:    # checking if file exists
    f = csv.reader(open(filepath, "r", encoding="cp1250"), delimiter=";", quotechar='"')
except IOError:
    f = []

for record in f:
    # do something with record

and it worked rather fast (I was opening two about 10MB each csv files, though I did this with python 2.6, not the 3.0 version).

There are few working modules for working with excel csv files from within python - pyExcelerator is one of them.

like image 35
zeroDivisible Avatar answered Oct 12 '22 23:10

zeroDivisible