Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split by regex without resulting empty strings in Python [duplicate]

Tags:

python

regex

I want to split a string containing irregularly repeating delimiter, like method split() does:

>>> ' a b   c  de  '.split()
['a', 'b', 'c', 'de']

However, when I apply split by regular expression, the result is different (empty strings sneak into the resulting list):

>>> re.split('\s+', ' a b   c  de  ')
['', 'a', 'b', 'c', 'de', '']
>>> re.split('\.+', '.a.b...c..de..')
['', 'a', 'b', 'c', 'de', '']

And what I want to see:

>>>some_smart_split_method('.a.b...c..de..')
['a', 'b', 'c', 'de']
like image 888
Roman Avatar asked Jun 19 '15 08:06

Roman


2 Answers

The empty strings are just an inevitable result of the regex split (though there is good reasoning as to why that behavior might be desireable). To get rid of them you can call filter on the result.

results = re.split(...)
results = list(filter(None, results))

Note the list() transform is only necessary in Python 3 -- in Python 2 filter() returns a list, while in 3 it returns a filter object.

like image 55
Walker Avatar answered Oct 21 '22 03:10

Walker


>>> re.findall(r'\S+', ' a b   c  de  ')
['a', 'b', 'c', 'de']
like image 36
dlask Avatar answered Oct 21 '22 04:10

dlask