Python regex splitting on multiple whitespaces

Question

I am expecting a user input string which I need to split into separate words. The user may input text delimited by commas or spaces.

So for instance the text may be:

hello world this is John. or

hello world this is John or even

hello world, this, is John

How can I efficiently parse that text into the following list?

['hello', 'world', 'this', 'is', 'John']

Thanks in advance.

Mr. Polywhirl · Accepted Answer

Use the regular expression: r'[\s,]+' to split on 1 or more white-space characters (\s) or commas (,).

import re

s = 'hello world,    this, is       John'
print re.split(r'[\s,]+', s)

['hello', 'world', 'this', 'is', 'John']

thefourtheye · Answer

Since you need to split based on spaces and other special characters, the best RegEx would be \W+. Quoting from Python re documentation

\W

When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] plus characters classified as not alphanumeric in the Unicode character properties database.

For Example,

data = "hello world,    this, is       John"
import re
print re.split("\W+", data)
# ['hello', 'world', 'this', 'is', 'John']

Or, if you have the list of special characters by which the string has to be split, you can do

print re.split("[\s,]+", data)

This splits based on any whitespace character (\s) and comma (,).

Python regex splitting on multiple whitespaces

Tags:

python

string

regex

stratis

2 Answers

Mr. Polywhirl

\W

thefourtheye

Recent Activity

Donate For Us

Python regex splitting on multiple whitespaces

Tags:

python

string

regex

stratis

2 Answers

Mr. Polywhirl

\W

thefourtheye

Related questions

Recent Activity

Donate For Us