Empty space character after re.split

Question

Here's a line from a .txt file I'm reading in, and I'm assigning it to x:

x = "Wild_lions live mostly in “Africa”"
result = re.split('[^a-zA-Z0-9]+', x)

I end up getting:

['Wild', 'lions', 'live', 'mostly', 'in', 'Africa', ''] # (there's an empty space character as the last element)

Why is there an empty space at the end? I realize I can just do result.remove(' ') to get rid of the space, but for large files I think this would be pretty inefficient.

m0nhawk · Accepted Answer

You don't need to use this complex regex to split by it, the simpler is:

result = re.split('\s+', x)
result
# ['Wild_lions', 'live', 'mostly', 'in', '“Africa”']

The \s+ will match any number of any whitespaces (tabs, spaces, newlines etc).

In case you need only alphabetical match, it's better to use re.compile with findall.

myre = re.compile('[a-zA-Z]+')
myre.findall(x)
# ['Wild', 'lions', 'live', 'mostly', 'in', 'Africa']

Cony · Answer

try this:

x = "Wild_lions live mostly in 'Africa'"
result = re.split('[\s_]+', x)

You'll get:

['Wild', 'lions', 'live', 'mostly', 'in', "'Africa'"]

Empty space character after re.split

Tags:

python

regex

split

expression

dppham1

2 Answers

m0nhawk

Cony

Recent Activity

Donate For Us

Empty space character after re.split

Tags:

python

regex

split

expression

dppham1

2 Answers

m0nhawk

Cony

Related questions

Recent Activity

Donate For Us