How to remove accents from values in columns?

Tags:

How do I change the special characters to the usual alphabet letters? This is my dataframe:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Åland Islands   2014.0      MARIEHAMN   11437.0 1
240         Åland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durrës      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durrës      56511.0

I want it to look like this:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Aland Islands   2014.0      MARIEHAMN   11437.0 1
240         Aland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durres      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durres      56511.0

409

asked Jun 20 '16 15:06

Marius

2 Answers

The pandas method is to use the vectorised str.normalize combined with str.decode and str.encode:

In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')

Out[60]:
0    Aland Islands
1    Aland Islands
2          Albania
3          Albania
4          Albania
Name: Country, dtype: object

So to do this for all str dtypes:

In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df

Out[64]:
   Table Code        Country    Year       City      Value
0         240  Aland Islands  2014.0  MARIEHAMN  11437.0 1
1         240  Aland Islands  2010.0  MARIEHAMN  5829.5  1
2         240        Albania  2011.0     Durres   113249.0
3         240        Albania  2011.0     TIRANA   418495.0
4         240        Albania  2011.0     Durres    56511.0

151

answered Oct 07 '22 03:10

EdChum

With pandas series example

def remove_accents(a):
    return unidecode.unidecode(a.decode('utf-8'))

df['column'] = df['column'].apply(remove_accents)

in this case decode asciis

answered Oct 07 '22 04:10

Caio Andrian

Related questions
                            
                                How to synchronize a python dict with multiprocessing
                            
                                argparse module not working in Python
                            
                                How to convert the output of meshgrid to the corresponding array of points?
                            
                                How to show query parameter options in Django REST Framework - Swagger
                            
                                Python merging two lists with all possible permutations
                            
                                Using SQLAlchemy session from Flask raises "SQLite objects created in a thread can only be used in that same thread"
                            
                                How to format seaborn/matplotlib axis tick labels from number to thousands or Millions? (125,436 to 125.4K)
                            
                                Why can I not catch a Queue.Empty exception from a multiprocessing Queue?
                            
                                Getting exception details in Python
                            
                                Python check if list items are integers? [duplicate]
                            
                                Adding y=x to a matplotlib scatter plot if I haven't kept track of all the data points that went in
                            
                                Round down datetime to previous hour
                            
                                Count number of words per row
                            
                                VS Code Python + Black formatter arguments - python.formatting.blackArgs
                            
                                Creating nested dataclass objects in Python
                            
                                Save Numpy Array using Pickle
                            
                                SymPy - Arbitrary number of Symbols
                            
                                Understanding "Too many ancestors" from pylint
                            
                                takeOrdered descending Pyspark
                            
                                Indexing a list with an unique index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove accents from values in columns?

Tags:

python

pandas

dataframe

Marius

People also ask

2 Answers

EdChum

Caio Andrian

Recent Activity

Donate For Us