I have begun to look through the Python Standard Library: (http://docs.python.org/3/library/functions.html) In an attempt to further familiarise myself with basic python. When it comes to the explanation on the ascii( ) function, I'm not finding it clear. Is someone able to supply a concise explanation giving examples of useful situations in which one may use the ascii( ) function please?

<code>ascii()</code> is a function that encodes the output of <code>repr()</code> to use escape sequences for any codepoint in the output produced by <code>repr()</code> that is not within the ASCII range. So a Latin 1 codepoint like <code>ë</code> is represented by the Python escape sequence <code>\xeb</code> instead. This was the standard representation in Python 2; Python 3 <code>repr()</code> leaves most Unicode codepoints as their actual value in the output, as long as it is a printable character: <pre class="prettyprint"><code>>>> print(repr('ë')) 'ë' >>> print(ascii('ë')) '\xeb' </code></pre> Both outputs are valid Python string literals, but the latter uses just ASCII characters, while the former requires a Unicode-compatible encoding. For unicode codepoints between U+0100 and U+FFFF <code>\uxxxx</code> escape code sequences are used, for anything over that the <code>\Uxxxxxxxx</code> form is used. See the available escape code syntax for Python string literals. Like <code>repr()</code>, <code>ascii()</code> is a very helpful debugging tool, especially when it comes to exact contents of a string. Unlike <code>repr()</code>, the <code>ascii()</code> output makes many Unicode gotchas much more visible. Take de-normalised codepoints for example; The <code>ë</code> character can be represented in two ways, as the U+00EB codepoint, or as an ASCII <code>e</code> plus combining diaeresis <code>¨</code> (codepoint U+0308): <pre class="prettyprint"><code>>>> import unicodedata >>> one, two = 'ë', unicodedata.normalize('NFD', 'ë') >>> print(one, two) ë ë >>> print(repr(one), repr(two)) 'ë' 'ë' >>> print(ascii(one), ascii(two)) '\xeb' 'e\u0308' </code></pre> Only with <code>ascii()</code> is it clear that <code>two</code> consists of two distinct codepoints.

Python - The Standard Library - ascii( ) Function

2 Answers

ascii() is a function that encodes the output of repr() to use escape sequences for any codepoint in the output produced by repr() that is not within the ASCII range.

So a Latin 1 codepoint like ë is represented by the Python escape sequence \xeb instead.

This was the standard representation in Python 2; Python 3 repr() leaves most Unicode codepoints as their actual value in the output, as long as it is a printable character:

>>> print(repr('ë'))
'ë'
>>> print(ascii('ë'))
'\xeb'

Both outputs are valid Python string literals, but the latter uses just ASCII characters, while the former requires a Unicode-compatible encoding.

For unicode codepoints between U+0100 and U+FFFF \uxxxx escape code sequences are used, for anything over that the \Uxxxxxxxx form is used. See the available escape code syntax for Python string literals.

Like repr(), ascii() is a very helpful debugging tool, especially when it comes to exact contents of a string. Unlike repr(), the ascii() output makes many Unicode gotchas much more visible.

Take de-normalised codepoints for example; The ë character can be represented in two ways, as the U+00EB codepoint, or as an ASCII e plus combining diaeresis ¨ (codepoint U+0308):

>>> import unicodedata
>>> one, two = 'ë', unicodedata.normalize('NFD', 'ë')
>>> print(one, two)
ë ë
>>> print(repr(one), repr(two))
'ë' 'ë'
>>> print(ascii(one), ascii(two))
'\xeb' 'e\u0308'

Only with ascii() is it clear that two consists of two distinct codepoints.

134

answered Oct 01 '22 14:10

Martijn Pieters

ascii() can be useful for finding out exactly what is in a string. If a string has whitespace or unprintable characters, or if the terminal is turning the string into mojibake because of a character-encoding mismatch, it is useful to look at the ascii representation of the string since it provides a visible and unambiguous representation for those otherwise unreadable characters which will print the same way on everyone's terminals.

There are frequent questions on Stackoverflow regarding incorrectly printed strings, and sometimes it is hard to tell what's going on because the question only shows the mojibake and not an unambiguous representation of the string. When the questioner shows the ascii representation (or the repr in Python 2) then the situation becomes much clearer.

answered Oct 01 '22 15:10

unutbu

Related questions
                            
                                JWTAssertionCredentials with service account fails with asn1 not enough data error
                            
                                django - How to insert the content of uploaded file in template?
                            
                                Django 1.6 - templates for password change/reset
                            
                                Line break or "\n" is not working.
                            
                                Passing values from one Python program to another
                            
                                OpenCV RGB single channel color regulation
                            
                                Why does Pillow not recognize the JPEG format?
                            
                                What is the equivalent of psycopg curs.mogrify on mysql?
                            
                                Error in SQLAlchemy with Integer: "object() takes no parameters"
                            
                                functools.partial and generators
                            
                                While generating all possible combinations itertools.combinations_with_replacement() vs itertools.product()?
                            
                                Appending to a DataFrame converts dtypes
                            
                                How to find set of most frequently occurring word-pairs in a file using python?
                            
                                Is there a bug in binning in matplotlib histograms? Or non-randomness of the rvs method in scipy.stats
                            
                                Change color implicit plot
                            
                                Python List in a For Loop
                            
                                Efficiently set row in SciPy sparse.lil_matrix?
                            
                                Is there a foolproof way to give the system enough time to delete a folder before running copytree
                            
                                Return a dict object from Jinja2 macros
                            
                                Expected string or buffer (in re.sub)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - The Standard Library - ascii( ) Function

Tags:

python

python-3.x

ascii

standard-library

Phoenix

People also ask

2 Answers

Martijn Pieters

unutbu

Recent Activity

Donate For Us