It appears, based on a urwid example that <code>u'\N{HYPHEN BULLET}</code> will create a unicode character that is a hyphen intended for a bullet. The names for unicode characters seem to be defined at fileformat.info and some element of using Unicode in Python appears in the howto documentation. Though there is no mention of the <code>\N{}</code> syntax. If you pull all these docs together you get the idea that the constant <code>u"\N{HYPHEN BULLET}"</code> creates a ⁃ However, this is all a theory based on pulling all this data together. I can find no documentation for <code>"\N{}</code> in the Python docs. My question is whether my theory of operation is correct and whether it is documented anywhere?

Not every gory detail can be found in a how-to. The table of escape sequences in the reference manual includes: Escape Sequence: <code>\N{name}</code> Meaning: Character named <code>name</code> in the Unicode database (Unicode only)

Details of Unicode Names \N Documented? [duplicate]

2 Answers

Not every gory detail can be found in a how-to. The table of escape sequences in the reference manual includes:

Escape Sequence: \N{name}
Meaning: Character named name in the Unicode database (Unicode only)

110

answered Oct 12 '22 11:10

Stop harming Monica

You are correct that u"\N{CHARACTER NAME} produces a valid unicode character in Python.

It is not documented much in the Python docs, but after some searching I found a reference to it on effbot.org

http://effbot.org/librarybook/ucnhash.htm

The ucnhash module

(Implementation, 2.0 only) This module is an implementation module, which provides a name to character code mapping for Unicode string literals. If this module is present, you can use \N{} escapes to map Unicode character names to codes.

In Python 2.1, the functionality of this module was moved to the unicodedata module.

Checking the documentation for unicodedata shows that the module is using the data from the Unicode Character Database.

unicodedata — Unicode Database

This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 9.0.0.

The full data can be found at: https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt

The data has the structure: HEXVALUE;CHARACTER NAME;etc.. so you could use this data to look up characters.

For example:

# 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
>>> u"\N{LATIN CAPITAL LETTER A}"
'A'

# FF7B;HALFWIDTH KATAKANA LETTER SA;Lo;0;L;<narrow> 30B5;;;;N;;;;;
>>> u"\N{HALFWIDTH KATAKANA LETTER SA}"
'ｻ'

answered Oct 12 '22 09:10

alxwrd

Related questions
                            
                                Converting a 2D numpy array to dataframe rows
                            
                                Shift NaNs to the end of their respective rows
                            
                                Accessing django database from python script
                            
                                solving Ax =b for a non-square matrix A using python
                            
                                Pandas dataframe group by multiple columns
                            
                                Retrieve the object ID in GraphQL
                            
                                Tensorflow Dataset.from_generator fails with pyfunc exception
                            
                                How to add DateTimeField in django without microsecond
                            
                                How to replace every NaN in a column with different random values using pandas?
                            
                                How to disable chrome's "save password" popup in selenium webdriver (python)
                            
                                How to do multi-class image classification in keras?
                            
                                How to use numba.jit with methods
                            
                                How to pip list with package urls
                            
                                Selecting a subset using dropna() to select multiple columns
                            
                                What is the recommended way to persist (pickle) custom sklearn pipelines?
                            
                                RandomForestRegressor and feature_importances_ error
                            
                                (python) Telegram bot- how to send messages periodically?
                            
                                How do you change ticks label sizes using Python's Bokeh?
                            
                                TensorFlow : How do i find my output node in my Tensorflow trained model?
                            
                                How to pass DataFrame as input to Spark UDF?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Details of Unicode Names \N Documented? [duplicate]

Tags:

python

unicode

python-2.7

Ray Salemi

People also ask

2 Answers

Stop harming Monica

The ucnhash module

unicodedata — Unicode Database

alxwrd

Recent Activity

Donate For Us