Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing a string at various punctuation marks using split()

I'm trying to divide a string into words, removing spaces and punctuation marks.

I tried using the split() method, passing all the punctuation at once, but my results were incorrect:

>>> test='hello,how are you?I am fine,thank you. And you?'
>>> test.split(' ,.?')
['hello,how are you?I am fine,thank you. And you?']

I actually know how to do this with regexes already, but I'd like to figure out how to do it using split(). Please don't give me a regex solution.

like image 708
leisurem Avatar asked Mar 21 '12 01:03

leisurem


2 Answers

This is the best way I can think of without using the re module:

"".join((char if char.isalpha() else " ") for char in test).split()
like image 78
Elias Zamaria Avatar answered Sep 20 '22 12:09

Elias Zamaria


If you want to split a string based on multiple delimiters, as in your example, you're going to need to use the re module despite your bizarre objections, like this:

>>> re.split('[?.,]', test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']

It's possible to get a similar result using split, but you need to call split once for every character, and you need to iterate over the results of the previous split. This works but it's u-g-l-y:

>>> sum([z.split() 
... for z in sum([y.split('?') 
... for y in sum([x.split('.') 
... for x in test.split(',')],[])], [])], [])
['hello', 'how', 'are', 'you', 'I', 'am', 'fine', 'thank', 'you', 'And', 'you']

This uses sum() to flatten the list returned by the previous iteration.

like image 45
larsks Avatar answered Sep 19 '22 12:09

larsks