Strip Non alpha numeric characters from string in python but keeping special characters

Question

I know similar questions were asked around here on StackOverflow. I tryed to adapt some of the approaches but I couldn't get anything to work, that fits my needs:

Given a python string I want to strip every non alpha numeric charater - but - leaving any special charater like µ æ Å Ç ß... Is this even possible? with regexes I tryed variations of this

re.sub(r'[^a-zA-Z0-9: ]', '', x) # x is my string to sanitize

but it strips me more then I want. An example of what I want would be:

Input:  "A string, with characters µ, æ, Å, Ç, ß,... Some    whitespace  confusion  ?"
Output: "A string with characters µ æ Å Ç ß Some whitespace confusion"

Is this even possible without getting complicated?

Ray Toal · Accepted Answer

Use \w with the UNICODE flag set. This will match the underscore also, so you might need to take care of that separately.

Details on http://docs.python.org/library/re.html.

EDIT: Here is some actual code. It will keep unicode letters, unicode digits, and spaces.

import re
x = u'$a_bßπ7: ^^@p'
pattern = re.compile(r'[^\w\s]', re.U)
re.sub(r'_', '', re.sub(pattern, '', x))

If you did not use re.U then the ß and π characters would have been stripped.

Sorry I can't figure out a way to do this with one regex. If you can, can you post a solution?

Strip Non alpha numeric characters from string in python but keeping special characters

Tags:

python

string

special-characters

translation

Aufwind

1 Answers

Ray Toal

Recent Activity

Donate For Us

Strip Non alpha numeric characters from string in python but keeping special characters

Tags:

python

string

special-characters

translation

Aufwind

1 Answers

Ray Toal

Related questions

Recent Activity

Donate For Us