Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a CSV line with "?

Tags:

python

csv

A trivial CSV line could be spitted using string split function. But some lines could have ", e.g.

"good,morning", 100, 300, "1998,5,3"

thus directly using string split would not solve the problem.

My solution is to first split out the line using , and then combining the strings with " at then begin or end of the string.

What's the best practice for this problem?

I am interested if there's a Python or F# code snippet for this.

EDIT: I am more interested in the implementation detail, rather than using a library.

like image 869
Yin Zhu Avatar asked Jan 26 '10 13:01

Yin Zhu


2 Answers

There's a csv module in Python, which handles this.

Edit: This task falls into "build a lexer" category. The standard way to do such tasks is to build a state machine (or use a lexer library/framework that will do it for you.)

The state machine for this task would probably only need two states:

  • Initial one, where it reads every character except comma and newline as part of field (exception: leading and trailing spaces) , comma as the field separator, newline as record separator. When it encounters an opening quote it goes into
  • read-quoted-field state, where every character (including comma & newline) excluding quote is treated as part of field, a quote not followed by a quote means end of read-quoted-field (back to initial state), a quote followed by a quote is treated as a single quote (escaped quote).

By the way, your concatenating solution will break on "Field1","Field2" or "Field1"",""Field2".

like image 76
Rafał Dowgird Avatar answered Oct 17 '22 02:10

Rafał Dowgird


From python's CSV module:

reading a normal CSV file:

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
    print row

Reading a file with an alternate format:

import csv
reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
    print row

There are some nice usage examples in LinuxJournal.com.

If you're interested with the details, read "split string at commas respecting quotes when string not in csv format" showing some nice regexen related to this problem, or simply read the csv module source.

like image 34
Adam Matan Avatar answered Oct 17 '22 02:10

Adam Matan