I am trying to split at any special character using re.split()
from the import re
package. This is what I have done so far, but doesn't really seem to work out yet. Any ideas?
word = [b for b in re.split(r'\`\-\=\~\!\@\#\$\%\^\&\*\(\)\_\+\[\]\{\}\;\'\\\:\"\|\<\,\.\/\>\<\>\?', a)]
Instead of enumerating all the "special" characters, it might be easier to create a class of characters where not to split and to reverse it using the ^
character.
For example, re.split(r"[^\w\s]", s)
will split at any character that's not in either the class \w
or \s
([a-zA-Z0-9_]
and [ \t\n\r\f\v]
respectively, see here for more info). However, note that the _
character is included in the \w
class, so you might instead want to explicitly specify all the "regular" characters, e.g. re.split(r"[^a-zA-Z0-9\s]", s)
.
>>> re.split(r"[^a-zA-Z0-9\s]", "foo bar_blub23/x~y'z")
['foo bar', 'blub23', 'x', 'y', 'z']
Use a character class:
re.split(r'[`\-=~!@#$%^&*()_+\[\]{};\'\\:"|<,./<>?]', a)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With