don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. I mean, is it possible to have an "approximate" conversion to some quite similar ascii character? For example: Gavin O’Connor gets converted to Gavin O\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. Is this possible? Did anyone write some util to do it, or do I have to manually replace all chars? Thank you very much! Marco

Use the Unidecode package to transliterate the string. <pre class="prettyprint"><code>>>> import unidecode >>> unidecode.unidecode(u'Gavin O’Connor') "Gavin O'Connor" </code></pre>

<pre class="prettyprint"><code>import unicodedata unicode_string = u"Gavin O’Connor" print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore') </code></pre> Output: <pre class="prettyprint"> Gavin O'Connor </pre> Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/

Approximately converting unicode string to ascii string in python

Tags:

python

string

unicode

ascii

don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. I mean, is it possible to have an "approximate" conversion to some quite similar ascii character?

For example: Gavin O’Connor gets converted to Gavin O\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. Is this possible? Did anyone write some util to do it, or do I have to manually replace all chars?

Thank you very much! Marco

622

asked Nov 10 '11 22:11

Marco Moschettini

2 Answers

Use the Unidecode package to transliterate the string.

>>> import unidecode >>> unidecode.unidecode(u'Gavin O’Connor') "Gavin O'Connor"

answered Sep 22 '22 07:09

Petr Viktorin

import unicodedata  unicode_string = u"Gavin O’Connor" print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')

Output:

 Gavin O'Connor

Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/

answered Sep 23 '22 07:09

Acorn

Related questions
                            
                                How to use the same Python virtualenv on both Windows and Linux
                            
                                VS Code: Analyzing in the background
                            
                                Python: "global name 'time' is not defined"
                            
                                Is it ok to remove the equal signs from a base64 string?
                            
                                OpenCV Python save jpg specifying quality; gives SystemError
                            
                                How to delete an app from a django project
                            
                                Check if node exists in h5py
                            
                                Python YAML: Controlling output format
                            
                                Bézier curve fitting with SciPy
                            
                                What command to use instead of urllib.request.urlretrieve?
                            
                                Simple multi layer neural network implementation [closed]
                            
                                How to get values from dictionary in jinja when key is a variable?
                            
                                How to iterate over each string in a list of strings and operate on it's elements
                            
                                Form validation fails due missing CSRF
                            
                                How to log to journald (systemd) via Python?
                            
                                Calculate sklearn.roc_auc_score for multi-class
                            
                                json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) [duplicate]
                            
                                How can I get a specific field of a csv file?
                            
                                What is the syntax for adding a GET parameter to a URL?
                            
                                Using "Counter" in Python 3.2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With