removing accent and special characters [duplicate]

Tags:

diacritics

Possible Duplicate:
What is the best way to remove accents in a python unicode string?
Python and character normalization

I would like to remove accents, turn all characters to lowercase, and delete any numbers and special characters.

Example :

Frédér8ic@ --> frederic

Proposal:

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
    unicodedata.category(x)[0] == 'L').lower()

Is there any better way to do this?

342

asked Jan 01 '12 18:01

1 Answers

A possible solution would be

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.printable).lower()

Using NFKD AFAIK is the standard way to normalize unicode to convert it to compatible characters. The rest as to remove the special characters numbers and unicode characters that originated from normalization, you can simply compare with string.ascii_letters and remove any character's not in that set.

answered Oct 03 '22 15:10

Abhijit

Related questions
                            
                                Python, logging: use custom handler with dictionary configuration?
                            
                                Reading multiple Python pickled data at once, buffering and newlines?
                            
                                How do you change the SQL isolation level from Python using MySQLdb?
                            
                                Is there a way to specify the build directory for py2exe
                            
                                Trouble activating virtualenv on server via Fabric
                            
                                Issues trying to SSH into a fresh EC2 instance with Paramiko
                            
                                How to get a win32 handle of an open file in python?
                            
                                Error "The object invoked has disconnected from its clients" - automate IE 8 with python and win32com
                            
                                os.path equivalent for web urls in python?
                            
                                Python For Loop Slowing With Time
                            
                                Intensity normalization of image using Python+PIL - Speed issues
                            
                                Why cannot pass print function to dir() in python?
                            
                                python reading text file
                            
                                Create kml from csv in Python
                            
                                How to get arguments list of a built-in Python class constructor?
                            
                                Find specific link w/ beautifulsoup
                            
                                Why this error from urllib?
                            
                                How should I implement "nested" subcommands in Python?
                            
                                What's the best way to disable Jinja2 template caching in bottle.py?
                            
                                Are there any toolkit libraries for curses with Python bindings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

removing accent and special characters [duplicate]

Tags:

python

diacritics

Fred

People also ask

1 Answers

Abhijit

Recent Activity

Donate For Us