How do I convert decorated latin unicode characters to plain latin in python

Tags:

unicode

Unicode specifies a bunch of modifications you can make to latin characters. How can I convert these unicode characters to vanilla latin characters in python?

To be clear, I'm not asking how to get rid of accents from letters. I'm asking how to convert things that have linguistically the same meaning, but some decorated display, like negative, encircled, enclosed in a box types of displays.

For example, how I do I convert

💦°🄾🅁🄸🄶🄸🄽🄰🄻°💦 c

💦°ORIGINAL°💦 c

(Stripping those non-language characters will be a separate task)

465

asked Aug 22 '19 23:08

xaviersjs

1 Answers

This isn't perfect, but what you're looking for is something like Unicode Decomposition. The concept of Unicode normalization and decomposition is a book of its own.

For something quick and dirty, fortunately, Python has this built-in for you!

>>> import unicodedata
>>> unicodedata.normalize('NFKC', '💦°🄾🅁🄸🄶🄸🄽🄰🄻°💦 c')
'💦°ORIGINAL°💦 c'

118

answered Sep 23 '22 08:09

Alyssa Haroldsen

Related questions
                            
                                How to generate legible plots in pandas when looping over columns?
                            
                                check element-wise for existence of string
                            
                                Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?
                            
                                Python efficient way of writing switch case with comparison
                            
                                How can i solve backward() got an unexpected keyword argument 'retain_variables'?
                            
                                Converting cftime.DatetimeJulian to datetime
                            
                                Can't reach Locust WebInterface "ERR_CONNECTION_REFUSED"
                            
                                Add arbitrary lines on seaborn jointplot
                            
                                Should the Conda (base) environment be kept up to date?
                            
                                How to use a pretrained model from s3 to predict some data?
                            
                                duplicate key value violates unique constraint in django
                            
                                Nested tf.function is horribly slow
                            
                                Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row
                            
                                Pandas / xlsxwriter writer.close() does not completely close the excel file
                            
                                Finding all possible combinations whose sum is within certain range of target
                            
                                How to map one dataframe to another (python pandas)?
                            
                                TypeError: cannot unpack non-iterable bool object
                            
                                Increase accuracy of detecting lines using OpenCV
                            
                                Why does pandas remove leading zero when writing to a csv?
                            
                                Efficiently remove duplicates, order-agnostic, from list of lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With