Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't split double-quoted words with Python string split()?

When using the Python string function split(), does anybody have a nifty trick to treat items surrounded by double-quotes as a non-splitting word?

Say I want to split only on white space and I have this:

>>> myStr = 'A B\t"C" DE "FE"\t\t"GH I JK L" "" ""\t"O P   Q" R'
>>> myStr.split()
['A', 'B', '"C"', 'DE', '"FE"', '"GH', 'I', 'JK', 'L"', '""', '""', '"O', 'P', 'Q"', 'R']

I'd like to treat anything within double-quotes as a single word, even if white spaces are embedded, so would like to end up with the below:

['A', 'B', 'C', 'DE', 'FE', 'GH I JK L', '', '', 'O P   Q', 'R']

Or at least this and then I'll strip off the double-quotes:

['A', 'B', '"C"', 'DE', '"FE"', '"GH I JK L"', '""', '""', '"O P   Q"', 'R']

Any non-regex suggestions?

like image 406
Rob Avatar asked Oct 24 '11 20:10

Rob


People also ask

Does split work on strings Python?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Can I split on two characters Python?

If you know the characters you want to split upon, just replace them with a space and then use . split(): Python3.


2 Answers

@Rob: why without regexes if the regexp solution is so simple?

my_str = 'A B\t"C" DE "FE"\t\t"GH I JK L" "" ""\t"O P   Q" R'
print re.findall(r'(\w+|".*?")', my_str)
['A', 'B', '"C"', 'DE', '"FE"', '"GH I JK L"', '""', '""', '"O P   Q"', 'R']
like image 181
PabloG Avatar answered Oct 17 '22 23:10

PabloG


You won't be able to get this behaviour with str.split(). If you can live with the rather complex parsing it does (like ignoring double quotes preceded by a back slash), shlex.split() might be what you are looking for:

>>> shlex.split(myStr)
['A', 'B', 'C', 'DE', 'FE', 'GH I JK L', '', '', 'O P   Q', 'R']
like image 30
Sven Marnach Avatar answered Oct 17 '22 21:10

Sven Marnach