I have a stack of CSV files I want to parse - the problem is half of the have quote marks used as quote marks, and commas inside main field. They are not really CSV, but they do have a fixed number of fields that are identifiable. The dialect=csv."excel" setting works perfectly on files with out the extra " and , chars inside the field.
This data is old/unsupported. I am trying to push some life into it.
e.g.
"AAAAA
AAAA
AAAA
AAAA","AAAAAAAA
AAAAAA
AAAAA "AAAAAA" AAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAA, AAAAA
AAAAAAAAA AAAAA AAAAAAAAAA
AAAAA, "AAAAA", AAAAAAAAA
AAAAAAAA AAAAAAAA
AAAAAAA
"
This is tripping the file parser, and throws an error _csv.Error: newline inside string
. I narrrowed it down to this being the issue by removing the quote marks from inside the 2nd field and the csv.reader module parses the file OK.
Some of the fields are multi line - I'm not sure if thats important to know.
I have been poking around at the dialect settings, and whilst I can find 'skipinitialspace', this doesn't seem to solve the problem.
To be clear - this is not valid 'CSV', its data objects that loosely follow a CSV structure, but have , and " chars inside the field test.
The lineterminator is \x0d\x0a
I have tried a number of goes at differnt permuations of doublequote and the quoting variable in the dialect module, but I can't get this parse correctly.
I can not be confident that a ," or ", combination exists only on field boundaries.
This problem only exists for one (the last) of several fields in the file, and there are several thousand files.
For me, the answer is, "Because when I export data into a CSV file, the commas in a field disappear and my field gets separated into multiple fields where the commas appear in the original data." (That it because the comma is the CSV field separator character.)
Show activity on this post. The CSV format uses commas to separate values, values which contain carriage returns, linefeeds, commas, or double quotes are surrounded by double-quotes. Values that contain double quotes are quoted and each literal quote is escaped by an immediately preceding quote: For example, the 3 values:
Hello, you can change your file format to "CSV". You need to do the following: Select "Save as type", Click on the dropdown arrow to choose the new format, e.g., CSV file (UTF-8) Comma delimited. I hope the above instructions help you. Aug 03 2018 02:20 AM Aug 03 2018 02:20 AM Re: Save as CSV file (UTF-8) with double quotes - how?
(That it because the comma is the CSV field separator character.) Depending on your situation, semi colons may also be used as CSV field separators. Given my requirements, I can use a character, e.g., single low-9 quotation mark, that looks like a comma. The second comma looking character in the Replace function is decimal 8218.
Have you tried passing csv.QUOTE_NONE
via the quoting
keyword arg? Without having some code or data to test this on, I have no way to know whether this actually works on your data, but it seems to work with the fragment you provided.
>>> import csv
>>> r = csv.reader(open('foo.csv', 'rb'), quoting=csv.QUOTE_NONE)
>>> for row in r: print row
...
['"A"', '"B"', '"ccc "ccccccc" cccccc"']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With