Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do You Split String into Words and Special Characters in Python?

I want to split a string into words [a-zA-Z] and any special character that it may contain except @ and # symbols

message = "I am to be @split, into #words, And any other thing that is not word, mostly special character(.,>)"

Expected Result:

['I', 'am', 'to', 'be', '@split', ',', 'into', '#words', ',', 'And', 'any', 'other', 'thing', 'that', 'is', 'not', 'word', ',', 'mostly', 'special', 'character', '(', '.', ',', '>', ')']

How can I achieve this in Python?

like image 671
Yax Avatar asked May 23 '26 12:05

Yax


2 Answers

How about:

re.findall(r"[A-Za-z@#]+|\S", message)

The pattern matches any sequence of word characters (here, defined as letters plus @ and #), or any single non-whitespace character.

like image 199
Blckknght Avatar answered May 26 '26 00:05

Blckknght


You can use a character class to specify all of the characters you don't want for the split. [^\w@#] -- this means every character except letters/numbers/underscore/@/#

Then you can capture the special characters as well using capturing parentheses in re.split.

filter(None, re.split(r'\s|([^\w@#])', message))

The filter is done to remove empty strings from splitting between special characters. The \s| part is so that spaces are not captured.

like image 21
Explosion Pills Avatar answered May 26 '26 02:05

Explosion Pills