A trivial CSV line could be spitted using string split function. But some lines could have "
, e.g.
"good,morning", 100, 300, "1998,5,3"
thus directly using string split would not solve the problem.
My solution is to first split out the line using ,
and then combining the strings with "
at then begin or end of the string.
What's the best practice for this problem?
I am interested if there's a Python or F# code snippet for this.
EDIT: I am more interested in the implementation detail, rather than using a library.
There's a csv module in Python, which handles this.
Edit: This task falls into "build a lexer" category. The standard way to do such tasks is to build a state machine (or use a lexer library/framework that will do it for you.)
The state machine for this task would probably only need two states:
By the way, your concatenating solution will break on "Field1","Field2"
or "Field1"",""Field2"
.
From python's CSV module:
reading a normal CSV file:
import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row
Reading a file with an alternate format:
import csv
reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print row
There are some nice usage examples in LinuxJournal.com.
If you're interested with the details, read "split string at commas respecting quotes when string not in csv format" showing some nice regexen related to this problem, or simply read the csv module source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With