Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to split at ALL special characters with re.split()

Tags:

python

regex

I am trying to split at any special character using re.split() from the import re package. This is what I have done so far, but doesn't really seem to work out yet. Any ideas?

word = [b for b in re.split(r'\`\-\=\~\!\@\#\$\%\^\&\*\(\)\_\+\[\]\{\}\;\'\\\:\"\|\<\,\.\/\>\<\>\?', a)]
like image 391
user706838 Avatar asked Nov 27 '22 03:11

user706838


2 Answers

Instead of enumerating all the "special" characters, it might be easier to create a class of characters where not to split and to reverse it using the ^ character.

For example, re.split(r"[^\w\s]", s) will split at any character that's not in either the class \w or \s ([a-zA-Z0-9_] and [ \t\n\r\f\v] respectively, see here for more info). However, note that the _ character is included in the \w class, so you might instead want to explicitly specify all the "regular" characters, e.g. re.split(r"[^a-zA-Z0-9\s]", s).

>>> re.split(r"[^a-zA-Z0-9\s]", "foo bar_blub23/x~y'z")
['foo bar', 'blub23', 'x', 'y', 'z']
like image 184
tobias_k Avatar answered Dec 10 '22 15:12

tobias_k


Use a character class:

re.split(r'[`\-=~!@#$%^&*()_+\[\]{};\'\\:"|<,./<>?]', a)
like image 23
Toto Avatar answered Dec 10 '22 15:12

Toto