Why is the Python CSV reader ignoring double-quoted fields?

Tags:

I think this is probably something simple, but after an hour of searching, I've had no luck figuring out what I'm doing wrong.

I'm using the following code to read a CSV file - I have no problem reading the file, but when a line contains a field that is double-quoted because it contains the delimiter, the CSV reader ignores the double-quotes and parses the field into 2 separate fields.

Here's the code I'm using:

myReader = csv.reader(open(inPath, 'r'), dialect='excel', delimiter=',', quotechar='"')
for row in myReader:
    print row,
    print len(row)

My input:

hello, this is row 1, foo1
hello, this is row 2, foo2
goodbye, "this, is row 3", foo3

Which gives me:

['hello', ' this is row 1', ' foo1'] 3
['hello', ' this is row 2', ' foo2'] 3
['goodbye', ' "this', ' is row 3"', ' foo3'] 4

What do I need to change so it will recognize the double-quoted field as one field? I'm using python version 2.6.1.

Thanks!

407

asked Jul 29 '11 22:07

jamz

2 Answers

If you look at the dialect that you're using, you'll notice that the excel dialect is configured as follows:

class excel(Dialect):
    """Describe the usual properties of Excel-generated CSV files."""
    delimiter = ','
    quotechar = '"'
    doublequote = True
    skipinitialspace = False
    lineterminator = '\r\n'
    quoting = QUOTE_MINIMAL

Notice that skipinitialspace is set to False. Just pass that into your reader. Oh and by the way, all the fields you've passed in are already the defaults when using the excel dialect, which is the default dialect parameter passed to csv.reader

So, I would re-write your code like so:

>>> with open(inPath) as fp:
>>>     reader = csv.reader(fp, skipinitialspace=True)
>>>     for row in reader:
>>>         print row,
>>>         print len(row)
['hello', 'this is row 1', 'foo1'] 3
['hello', 'this is row 2', 'foo2'] 3
['goodbye', 'this, is row 3', 'foo3'] 3

answered Oct 19 '22 23:10

Mahmoud Abdelkader

It's because your csv has spaces before the quotes:

one0, one1, one2
two0, two1, two2
tre0, "tr,e1", tre2

one0,one1,one2
two0,two1,two2
tre0,"tr,e1",tre2

You'll need to remove those extra spaces first.

answered Oct 19 '22 23:10

TorelTwiddler

Related questions
                            
                                Substitutes for x86 assembly 'call' instruction?
                            
                                function returns multiple columns as a single column instead of multiple columns
                            
                                How to force an observable to read the value from the DOM Element
                            
                                Android - Fragment API for API level < 11
                            
                                How do I bind the background of a data grid row to specific color?
                            
                                Guidelines for viewWillAppear, viewDidAppear, viewWillDisappear, viewDidDisappear
                            
                                Checking in bash and csh if a command is builtin
                            
                                is it recommended to make associations to enum classes in uml class diagram?
                            
                                Built in parsing of a string to a Scala case object?
                            
                                Where GTK finds icon names to use with gtk_image_new_from_icon_name()?
                            
                                Remove an element from a vector by value - C++
                            
                                Is this an appropriate use of python's built-in hash function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With