Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing accent and special characters [duplicate]

Possible Duplicate:
What is the best way to remove accents in a python unicode string?
Python and character normalization

I would like to remove accents, turn all characters to lowercase, and delete any numbers and special characters.

Example :

Frédér8ic@ --> frederic

Proposal:

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
    unicodedata.category(x)[0] == 'L').lower()

Is there any better way to do this?

like image 342
Fred Avatar asked Jan 01 '12 18:01

Fred


People also ask

How do you change an accented character to a regular character?

replace(/[^a-z0-9]/gi,'') . However a more intuitive solution (at least for the user) would be to replace accented characters with their "plain" equivalent, e.g. turn á , á into a , and ç into c , etc.

How do you remove accent marks in Python?

We can remove accents from the string by using a Python module called Unidecode. This module consists of a method that takes a Unicode object or string and returns a string without ascents.


1 Answers

A possible solution would be

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.printable).lower()

Using NFKD AFAIK is the standard way to normalize unicode to convert it to compatible characters. The rest as to remove the special characters numbers and unicode characters that originated from normalization, you can simply compare with string.ascii_letters and remove any character's not in that set.

like image 93
Abhijit Avatar answered Oct 03 '22 15:10

Abhijit