Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python CSV module - quotes go missing

I have a CSV file that has data like this

15,"I",2,41301888,"BYRNESS RAW","","BYRNESS VILLAGE","NORTHUMBERLAND","ENG"
11,"I",3,41350101,2,2935,2,2008-01-09,1,8,0,2003-02-01,,2009-12-22,2003-02-11,377016.00,601912.00,377105.00,602354.00,10

I am reading this and then writing different rows to different CSV files.

However, in the original data there are quotes around the non-numeric fields, as some of them contain commas within the field.

I am not able to keep the quotes.

I have researched lots and discovered the quoting=csv.QUOTE_NONNUMERIC however this now results in a quote mark around every field and I dont know why??

If i try one of the other quoting options like MINIMAL I end up with an error message regarding the date value, 2008-01-09, not being a float.

I have tried to create a dialect, add the quoting on the csv reader and writer but nothing I have tried results in the getting an exact match to the original data.

Anyone had this same problem and found a solution.

like image 614
tjmgis Avatar asked Feb 19 '12 22:02

tjmgis


People also ask

Why does my csv file have quotation marks?

Quotation marks are used as text qualifiers Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.

How do I show a double quote in a csv file?

Double-quote escape characters There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.

What is Quotechar in CSV Python?

quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ). escapechar specifies the character used to escape the delimiter character, in case quotes aren't used.

What is the significance of Quote_nonnumeric?

QUOTE_NONNUMERIC Instructs writer objects to quote all non-numeric fields. Instructs the reader to convert all non-quoted fields to type float.


2 Answers

When writing, quoting=csv.QUOTE_NONNUMERIC keeps values unquoted as long as they're numbers, ie. if their type is int or float (for example), which means it will write what you expect.

Your problem could be that, when reading, a csv.reader will turn every row it reads into a list of strings (if you read the documentation carefully enough, you'll see a reader does not perform automatic data type conversion!

If you don't perform any kind of conversion after reading, then when you write you'll end up with everything on quotes... because everything you write is a string.

Edit: of course, date fields will be quoted, because they are not numbers, meaning you cannot get the exact expected behaviour using the standard csv.writer.

like image 100
Ricardo Cárdenes Avatar answered Sep 28 '22 06:09

Ricardo Cárdenes


Are you sure you have a problem? The behavior you're describing is correct: The csv module will enclose strings in quotes only if it's necessary for parsing them correctly. So you should expect to see quotes only around strings containing a comma, newlines, etc. Unless you're getting errors reading your output back in, there is no problem.

like image 28
alexis Avatar answered Sep 28 '22 05:09

alexis