I am expecting a user input string which I need to split into separate words. The user may input text delimited by commas or spaces.
So for instance the text may be:
hello world this is John
.
or
hello world this is John
or even
hello world, this, is John
How can I efficiently parse that text into the following list?
['hello', 'world', 'this', 'is', 'John']
Thanks in advance.
Use the regular expression: r'[\s,]+'
to split on 1 or more white-space characters (\s
) or commas (,
).
import re
s = 'hello world, this, is John'
print re.split(r'[\s,]+', s)
['hello', 'world', 'this', 'is', 'John']
Since you need to split based on spaces and other special characters, the best RegEx would be \W+
. Quoting from Python re documentation
\W
When the
LOCALE
andUNICODE
flags are not specified, matches any non-alphanumeric character; this is equivalent to the set[^a-zA-Z0-9_]
. WithLOCALE
, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than[0-9_]
plus characters classified as not alphanumeric in the Unicode character properties database.
For Example,
data = "hello world, this, is John"
import re
print re.split("\W+", data)
# ['hello', 'world', 'this', 'is', 'John']
Or, if you have the list of special characters by which the string has to be split, you can do
print re.split("[\s,]+", data)
This splits based on any whitespace character (\s
) and comma (,
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With