Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV files with quote and comma chars inside fields

Tags:

python

csv

quote

I have a stack of CSV files I want to parse - the problem is half of the have quote marks used as quote marks, and commas inside main field. They are not really CSV, but they do have a fixed number of fields that are identifiable. The dialect=csv."excel" setting works perfectly on files with out the extra " and , chars inside the field.

This data is old/unsupported. I am trying to push some life into it.

e.g.

"AAAAA
AAAA
AAAA
AAAA","AAAAAAAA


AAAAAA
AAAAA "AAAAAA" AAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAA, AAAAA
AAAAAAAAA AAAAA AAAAAAAAAA
AAAAA, "AAAAA", AAAAAAAAA
AAAAAAAA AAAAAAAA
AAAAAAA
"

This is tripping the file parser, and throws an error _csv.Error: newline inside string. I narrrowed it down to this being the issue by removing the quote marks from inside the 2nd field and the csv.reader module parses the file OK.

Some of the fields are multi line - I'm not sure if thats important to know.

I have been poking around at the dialect settings, and whilst I can find 'skipinitialspace', this doesn't seem to solve the problem.

To be clear - this is not valid 'CSV', its data objects that loosely follow a CSV structure, but have , and " chars inside the field test.

The lineterminator is \x0d\x0a

I have tried a number of goes at differnt permuations of doublequote and the quoting variable in the dialect module, but I can't get this parse correctly.

I can not be confident that a ," or ", combination exists only on field boundaries.

This problem only exists for one (the last) of several fields in the file, and there are several thousand files.

like image 488
Jay Gattuso Avatar asked Feb 10 '12 23:02

Jay Gattuso


People also ask

Why are there no commas in my CSV file?

For me, the answer is, "Because when I export data into a CSV file, the commas in a field disappear and my field gets separated into multiple fields where the commas appear in the original data." (That it because the comma is the CSV field separator character.)

How do I use quotes in a CSV file?

Show activity on this post. The CSV format uses commas to separate values, values which contain carriage returns, linefeeds, commas, or double quotes are surrounded by double-quotes. Values that contain double quotes are quoted and each literal quote is escaped by an immediately preceding quote: For example, the 3 values:

How to save as CSV file (UTF-8 with double quotes)?

Hello, you can change your file format to "CSV". You need to do the following: Select "Save as type", Click on the dropdown arrow to choose the new format, e.g., CSV file (UTF-8) Comma delimited. I hope the above instructions help you. Aug 03 2018 02:20 AM Aug 03 2018 02:20 AM Re: Save as CSV file (UTF-8) with double quotes - how?

How to use comma as a CSV field separator?

(That it because the comma is the CSV field separator character.) Depending on your situation, semi colons may also be used as CSV field separators. Given my requirements, I can use a character, e.g., single low-9 quotation mark, that looks like a comma. The second comma looking character in the Replace function is decimal 8218.


1 Answers

Have you tried passing csv.QUOTE_NONE via the quoting keyword arg? Without having some code or data to test this on, I have no way to know whether this actually works on your data, but it seems to work with the fragment you provided.

>>> import csv
>>> r = csv.reader(open('foo.csv', 'rb'), quoting=csv.QUOTE_NONE)
>>> for row in r: print row
... 
['"A"', '"B"', '"ccc "ccccccc" cccccc"']
like image 101
senderle Avatar answered Nov 15 '22 16:11

senderle