Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSVs in Python with newline in quotes [duplicate]

Tags:

python

csv

I get the impression that this is a common problem, I have a csv file with newlines within the fields. I am looking for a fix within Python--and within the csv module if possible.

Here is an example file that I have created

$ more test_csv.csv
a,"b",c,d,"e
e
e",f
a,bb,c,d,ee ,"f
f"
a,b,"c
c",d,e,f

Not all fields will be wrapped in quotes (although my usage is random in this example, the actual file should match quoting=csv.QUOTE_MINIMAL)

The output should resemble

[[a,b,c,d,"e\ne\ne",f],[a,bb,c,d,ee,"f\nf"][a,b,"c\nc",d,e,f]]

Instead I am getting

[[['a', 'b', 'c', 'd', 'e\n']], [['e']], [['e"', 'f']], [['a', 'bb', 'c', 'd', 'ee ', 'f\n']], [['f"']], [['a', 'b', 'c\n']], [['c"', 'd', 'e', 'f']]]

Please focus on the amount of rows and columns. Another concern is that in the thirds row, a quote was included when it should not have been.

Here is my code so far:

import csv

file = open('test_csv.csv', 'r')
rows = []
for line in file:
  fields = []  
  mycsv = csv.reader([line], dialect='excel', \
    quotechar='"', quoting=csv.QUOTE_MINIMAL)
  for field in mycsv:
    fields.append(field)
  rows.append(fields)

Thank you.

like image 657
artdv Avatar asked Sep 10 '13 17:09

artdv


1 Answers

Instead of splitting the lines yourself, let csv.reader do it:

>>> from StringIO import StringIO
>>> import csv
>>> file = StringIO("""a,"b",c,d,"e
e
e",f
a,bb,c,d,ee ,"f
f"
a,b,"c
c",d,e,f""")
>>> for line in csv.reader(file):
    print line

['a', 'b', 'c', 'd', 'e\ne\ne', 'f']
['a', 'bb', 'c', 'd', 'ee ', 'f\nf']
['a', 'b', 'c\nc', 'd', 'e', 'f']

Further explanation: By looping over the lines yourself, and creating a read for each line, you are logically treating the file as if each line was a separate and complete csv file. Instead, you want to treat the whole file as a csv document. You can either do this by passing the file object into csv.reader, since iterating over a file object iterates over the lines of the file, or reading the file yourself, splitting the lines by newlines, and then passing in the list of all the split lines into one csv.reader.

like image 137
Claudiu Avatar answered Nov 15 '22 08:11

Claudiu