Special text to latin characters in python

Tags:

python

pandas

I have the following pandas data frame:

the_df = pd.DataFrame({'id':[1,2],'name':['Joe','𝒮𝒶𝓇𝒶𝒽']})
the_df
    id  name
0   1   Joe
1   2   𝒮𝒶𝓇𝒶𝒽

As you can see, we can read the second name as "Sarah", but it's written with special characters.

I want to create a new column with these characters converted to latin characters. I have tried this approach:

the_df['latin_name'] = the_df['name'].str.extract(r'(^[a-zA-Z\s]*)')
the_df
    id  name    latin_name
0   1   Joe     Joe
1   2   𝒮𝒶𝓇𝒶𝒽

But it doesn't recognize the letters. Please, any help on this will be greatly appreciated.

599

asked Aug 05 '21 17:08

Alexis

2 Answers

Try .str.normalize

the_df['name'].str.normalize('NFKC').str.extract(r'(^[a-zA-Z\s]*)')

Output:

       0
0    Joe
1  Sarah

answered Oct 09 '22 00:10

Scott Boston

You can use unicodedata.normalize:

>>> import unicodedata
>>> df['name'].apply(lambda x: unicodedata.normalize('NFKD', x))
0      Joe
1    Sarah
Name: name, dtype: object

answered Oct 09 '22 00:10

ThePyGuy

Related questions
                            
                                Pyspark filter dataframe if column does not contain string
                            
                                Unable to code for non-squares integers in Python
                            
                                How to get next available object or primary key from database in django
                            
                                how to convert HuggingFace's Seq2seq models to onnx format
                            
                                python-requests how to send cipher name/http2
                            
                                Django get min and max value from PostgreSQL specific ArrayField holding IntegerField(s)
                            
                                How to raise every element of a vector to the power of every element of another vector?
                            
                                Cannot install pyaudio in google colab
                            
                                How to order an array and count it in Python?
                            
                                Software based on Python 3.9 is not working on Windows 7
                            
                                filter class/subfolder with pytorch ImageFolder
                            
                                Use lazy % formatting in logging functions pylint error message
                            
                                Numpy matrix multiplication but instead of multiplying it XOR's elements
                            
                                Julia symbolic and numeric performance vs Python
                            
                                Apply a function to each cell of a pandas dataframe using information from a particular column
                            
                                How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?
                            
                                Pythonic way to apply multiple class methods to list of objects
                            
                                How to scatter randomly points on a sphere
                            
                                Seaborn scatterplot can't get hue_order to work
                            
                                Converting a recursion problem code from Python to Common Lisp

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With