I'm trying to divide a string into words, removing spaces and punctuation marks.
I tried using the split()
method, passing all the punctuation at once, but my results were incorrect:
>>> test='hello,how are you?I am fine,thank you. And you?'
>>> test.split(' ,.?')
['hello,how are you?I am fine,thank you. And you?']
I actually know how to do this with regexes already, but I'd like to figure out how to do it using split()
. Please don't give me a regex solution.
This is the best way I can think of without using the re module:
"".join((char if char.isalpha() else " ") for char in test).split()
If you want to split a string based on multiple delimiters, as in your example, you're going to need to use the re
module despite your bizarre objections, like this:
>>> re.split('[?.,]', test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']
It's possible to get a similar result using split
, but you need to call split once for every character, and you need to iterate over the results of the previous split. This works but it's u-g-l-y:
>>> sum([z.split()
... for z in sum([y.split('?')
... for y in sum([x.split('.')
... for x in test.split(',')],[])], [])], [])
['hello', 'how', 'are', 'you', 'I', 'am', 'fine', 'thank', 'you', 'And', 'you']
This uses sum()
to flatten the list returned by the previous iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With