Detecting unicode private use area characters with python

Tags:

What is the proper way to identify unicode private use characters in python 3? There's nothing obviously relevant in the module unicodedata, which which makes it easy to look up character names and attributes.

Some background: unicodedata.name(), which gives the name of unicode characters, will raise a ValueError if called with a private use character (e.g., try unicodedata.name("\uf026")). But whitespace characters (except for space itself), and possibly other things, also trigger an exception. So what's a non-hacky, reliable way to detect PUA characters?

283

asked Sep 12 '15 14:09

alexis

1 Answers

Private use characters are all in the Co general category, as returned by category() in unicodedata:

>>> import unicodedata
>>> def is_pua(c):
...   return unicodedata.category(c) == 'Co'
...
>>> is_pua(u'\uF026')
True

Given that the Unicode Standard guarantees that the set of private use characters will never change (no characters will ever be added or removed), it's also safe to hard-code the three ranges:

U+E000 to U+F8FF
U+F0000 to U+FFFFD
U+100000 to U+10FFFD

answered Sep 26 '22 17:09

一二三

Related questions
                            
                                How to import from __init__.py with Flask
                            
                                cPickle.UnpicklingError: invalid load key, ' '.?
                            
                                pass multiple argument to sys.stdout.write
                            
                                Most efficient way to get the integer index of a key in pandas
                            
                                Flask Admin ModelView different fields between CREATE and EDIT
                            
                                Fast distance calculation in scipy and numpy
                            
                                Webbrowser converts double quotes to %2522
                            
                                Apache webserver and Flask app
                            
                                How to create Matplotlib figure with image and profile plots that fit together?
                            
                                OpenCV error with AdaptiveThreshold
                            
                                Drop the date from a matplotlib time series plot
                            
                                Export pip packages
                            
                                sklearn - model keeps overfitting
                            
                                How to tell the HTTP server to not send chunked encoding
                            
                                How to run WordCountTopology from storm-starter in Intellij
                            
                                Is there an alternative to Ansible on Python3
                            
                                How to prevent key creation through d[key] = val
                            
                                How to raise an error / return a {"foo":["This field is required."]} response in Django REST
                            
                                Multiple results for each individual row (one-to-many) with Pandas
                            
                                How to create .exe using py2exe(or pyinstaller) on Ubuntu

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detecting unicode private use area characters with python

Tags:

python

python-3.x

unicode

alexis

People also ask

1 Answers

一二三

Recent Activity

Donate For Us