How do I get str.translate to work with Unicode strings?

Tags:

I have the following code:

import string def translate_non_alphanumerics(to_translate, translate_to='_'):     not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'     translate_table = string.maketrans(not_letters_or_digits,                                        translate_to                                          *len(not_letters_or_digits))     return to_translate.translate(translate_table)

Which works great for non-unicode strings:

>>> translate_non_alphanumerics('<foo>!') '_foo__'

But fails for unicode strings:

>>> translate_non_alphanumerics(u'<foo>!') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "<stdin>", line 5, in translate_non_alphanumerics TypeError: character mapping must return integer, None or unicode

I can't make any sense of the paragraph on "Unicode objects" in the Python 2.6.2 docs for the str.translate() method.

How do I make this work for Unicode strings?

566

asked Aug 24 '09 18:08

Daryl Spitzer

1 Answers

The Unicode version of translate requires a mapping from Unicode ordinals (which you can retrieve for a single character with ord) to Unicode ordinals. If you want to delete characters, you map to None.

I changed your function to build a dict mapping the ordinal of every character to the ordinal of what you want to translate to:

def translate_non_alphanumerics(to_translate, translate_to=u'_'):     not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'     translate_table = dict((ord(char), translate_to) for char in not_letters_or_digits)     return to_translate.translate(translate_table)  >>> translate_non_alphanumerics(u'<foo>!') u'_foo__'

edit: It turns out that the translation mapping must map from the Unicode ordinal (via ord) to either another Unicode ordinal, a Unicode string, or None (to delete). I have thus changed the default value for translate_to to be a Unicode literal. For example:

>>> translate_non_alphanumerics(u'<foo>!', u'bad') u'badfoobadbad'

187

answered Sep 24 '22 00:09

Mike Boers

Related questions
                            
                                How to import a Python module from a sibling folder?
                            
                                Forward declaration of classes?
                            
                                Python for C++ Developers [closed]
                            
                                Errno 10061 : No connection could be made because the target machine actively refused it ( client - server )
                            
                                Django: what is the difference (rel & field)
                            
                                Celery task that runs more tasks
                            
                                TypeError: Object of type 'bytes' is not JSON serializable
                            
                                What is __peg_parser__ in Python?
                            
                                What can multiprocessing and dill do together?
                            
                                Get date object for the first/last day of the current year
                            
                                How does Python's "super" do the right thing?
                            
                                Parsing SQL with Python
                            
                                Set literal gives different result from set function call
                            
                                How can I learn more about Python’s internals? [closed]
                            
                                What's the difference between python3.<x> and python3.<x>m [duplicate]
                            
                                Empty class object in Python
                            
                                How to append data to one specific dataset in a hdf5 file with h5py
                            
                                Python: Import And Initialize Argparse After if __name__ == '__main__'?
                            
                                Git - Should Pipfile.lock be committed to version control?
                            
                                Python equivalent of Typescript interface

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get str.translate to work with Unicode strings?

Tags:

python

string

unicode

Daryl Spitzer

People also ask

1 Answers

Mike Boers

Recent Activity

Donate For Us