I have the following text
text = 'This is "a simple" test'
And I need to split it in two ways, first by quotes and then by spaces, resulting in:
res = ['This', 'is', '"a simple"', 'test']
But with str.split()
I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?
How do I split a string based on space but take quoted substrings as one word? \S* - followed by zero or more non-space characters.
Python split() method is used to split the string into chunks, and it accepts one argument called separator. A separator can be any character or a symbol. If no separators are defined, then it will split the given string and whitespace will be used by default.
In Python, such sequence of characters is included inside single or double quotes. As far as language syntax is concerned, there is no difference in single or double quoted string. Both representations can be used interchangeably.
You can use shlex.split
, handy for parsing quoted strings:
>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']
Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix
is set to True
by default:
>>> shlex.split(text)
['This', 'is', 'a simple', 'test']
If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader
:
import io
import csv
s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print(list(f))
# [['This', 'is', 'a simple', 'test']]
If on Python 3, you won't need to decode the string to unicode as all strings are already unicode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With