I think this is probably something simple, but after an hour of searching, I've had no luck figuring out what I'm doing wrong.
I'm using the following code to read a CSV file - I have no problem reading the file, but when a line contains a field that is double-quoted because it contains the delimiter, the CSV reader ignores the double-quotes and parses the field into 2 separate fields.
Here's the code I'm using:
myReader = csv.reader(open(inPath, 'r'), dialect='excel', delimiter=',', quotechar='"')
for row in myReader:
print row,
print len(row)
My input:
hello, this is row 1, foo1
hello, this is row 2, foo2
goodbye, "this, is row 3", foo3
Which gives me:
['hello', ' this is row 1', ' foo1'] 3
['hello', ' this is row 2', ' foo2'] 3
['goodbye', ' "this', ' is row 3"', ' foo3'] 4
What do I need to change so it will recognize the double-quoted field as one field? I'm using python version 2.6.1.
Thanks!
info = csv. reader(open('./info. csv')) for row in info : print row[0] + " * " + row[1] ...
quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ). escapechar specifies the character used to escape the delimiter character, in case quotes aren't used.
escapechar parameter is a string to escape the delimiter if quoting is set to csv.
If you look at the dialect that you're using, you'll notice that the excel dialect is configured as follows:
class excel(Dialect):
"""Describe the usual properties of Excel-generated CSV files."""
delimiter = ','
quotechar = '"'
doublequote = True
skipinitialspace = False
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
Notice that skipinitialspace
is set to False. Just pass that into your reader.
Oh and by the way, all the fields you've passed in are already the defaults when
using the excel
dialect, which is the default dialect parameter passed to csv.reader
So, I would re-write your code like so:
>>> with open(inPath) as fp:
>>> reader = csv.reader(fp, skipinitialspace=True)
>>> for row in reader:
>>> print row,
>>> print len(row)
['hello', 'this is row 1', 'foo1'] 3
['hello', 'this is row 2', 'foo2'] 3
['goodbye', 'this, is row 3', 'foo3'] 3
It's because your csv has spaces before the quotes:
one0, one1, one2
two0, two1, two2
tre0, "tr,e1", tre2
vs
one0,one1,one2
two0,two1,two2
tre0,"tr,e1",tre2
You'll need to remove those extra spaces first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With