Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python split text by quotes and spaces

I have the following text

text = 'This is "a simple" test'

And I need to split it in two ways, first by quotes and then by spaces, resulting in:

res = ['This', 'is', '"a simple"', 'test']

But with str.split() I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?

like image 225
wasp256 Avatar asked Jul 11 '17 10:07

wasp256


People also ask

How do I split a string based on space but take quoted Substrings as one word?

How do I split a string based on space but take quoted substrings as one word? \S* - followed by zero or more non-space characters.

How do you split part of a string in Python?

Python split() method is used to split the string into chunks, and it accepts one argument called separator. A separator can be any character or a symbol. If no separators are defined, then it will split the given string and whitespace will be used by default.

Are single quotes more pythonic?

In Python, such sequence of characters is included inside single or double quotes. As far as language syntax is concerned, there is no difference in single or double quoted string. Both representations can be used interchangeably.


1 Answers

You can use shlex.split, handy for parsing quoted strings:

>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']

Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix is set to True by default:

>>> shlex.split(text)
['This', 'is', 'a simple', 'test']

If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader:

import io
import csv

s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print(list(f))
# [['This', 'is', 'a simple', 'test']]

If on Python 3, you won't need to decode the string to unicode as all strings are already unicode.

like image 80
Moses Koledoye Avatar answered Oct 13 '22 17:10

Moses Koledoye