Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform comma separated string into a list but ignore comma in quotes

How do I convert "1,,2'3,4'" into a list? Commas separate the individual items, unless they are within quotes. In that case, the comma is to be included in the item.

This is the desired result: ['1', '', '2', '3,4']. One regex I found on another thread to ignore the quotes is as follows:

re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')

But this gives me this output:

['', '1', ',,', "2'3,4'", '']

I can't understand, where these extra empty strings are coming from, and why the two commas are even being printed at all, let alone together.

I tried making this regex myself:

re.compile(r'''(, | "[^"]*" | '[^']*')''')

which ended up not detecting anything, and just returned my original list.

I don't understand why, shouldn't it detect the commas at the very least? The same problem occurs if I add a ? after the comma.

like image 775
limasxgoesto0 Avatar asked Aug 04 '12 02:08

limasxgoesto0


2 Answers

Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:

from cStringIO import StringIO
from csv import reader

file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
    print row

This results in the following output:

['1', '', '2', '3,4']
like image 71
Sean Vieira Avatar answered Sep 27 '22 22:09

Sean Vieira


pyparsing includes a predefined expression for comma-separated lists:

>>> from pyparsing import commaSeparatedList
>>> s = "1,,2'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', "2'3", "4'"]

Hmm, looks like you have a typo in your data, missing a comma after the 2:

>>> s = "1,,2,'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', '2', "'3,4'"]
like image 33
PaulMcG Avatar answered Sep 27 '22 22:09

PaulMcG