Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Empty space character after re.split

Here's a line from a .txt file I'm reading in, and I'm assigning it to x:

x = "Wild_lions live mostly in “Africa”"
result = re.split('[^a-zA-Z0-9]+', x)

I end up getting:

['Wild', 'lions', 'live', 'mostly', 'in', 'Africa', ''] # (there's an empty space character as the last element)

Why is there an empty space at the end? I realize I can just do result.remove(' ') to get rid of the space, but for large files I think this would be pretty inefficient.

like image 598
dppham1 Avatar asked Oct 31 '25 12:10

dppham1


2 Answers

You don't need to use this complex regex to split by it, the simpler is:

result = re.split('\s+', x)
result
# ['Wild_lions', 'live', 'mostly', 'in', '“Africa”']

The \s+ will match any number of any whitespaces (tabs, spaces, newlines etc).


In case you need only alphabetical match, it's better to use re.compile with findall.

myre = re.compile('[a-zA-Z]+')
myre.findall(x)
# ['Wild', 'lions', 'live', 'mostly', 'in', 'Africa']
like image 63
m0nhawk Avatar answered Nov 02 '25 02:11

m0nhawk


try this:

x = "Wild_lions live mostly in 'Africa'"
result = re.split('[\s_]+', x)

You'll get:

['Wild', 'lions', 'live', 'mostly', 'in', "'Africa'"]
like image 36
Cony Avatar answered Nov 02 '25 00:11

Cony



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!