I want to split a string into words [a-zA-Z] and any special character that it may contain except @ and # symbols
message = "I am to be @split, into #words, And any other thing that is not word, mostly special character(.,>)"
Expected Result:
['I', 'am', 'to', 'be', '@split', ',', 'into', '#words', ',', 'And', 'any', 'other', 'thing', 'that', 'is', 'not', 'word', ',', 'mostly', 'special', 'character', '(', '.', ',', '>', ')']
How can I achieve this in Python?
How about:
re.findall(r"[A-Za-z@#]+|\S", message)
The pattern matches any sequence of word characters (here, defined as letters plus @ and #), or any single non-whitespace character.
You can use a character class to specify all of the characters you don't want for the split. [^\w@#] -- this means every character except letters/numbers/underscore/@/#
Then you can capture the special characters as well using capturing parentheses in re.split.
filter(None, re.split(r'\s|([^\w@#])', message))
The filter is done to remove empty strings from splitting between special characters. The \s| part is so that spaces are not captured.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With