I have begun to look through the Python Standard Library: (http://docs.python.org/3/library/functions.html)
In an attempt to further familiarise myself with basic python. When it comes to the explanation on the ascii( ) function, I'm not finding it clear.
Is someone able to supply a concise explanation giving examples of useful situations in which one may use the ascii( ) function please?
Python ascii() Function The ascii() function returns a readable version of any object (Strings, Tuples, Lists, etc). The ascii() function will replace any non-ascii characters with escape characters: å will be replaced with \xe5 .
To get ascii value of char python, the ord () method is used. It is in-built in the library of character methods provided by Python. ASCII or American Standard Code for Information Interchange is the numeric value that is given to different characters and symbols.
The Python Standard Library is a collection of script modules accessible to a Python program to simplify the programming process and removing the need to rewrite commonly used commands. They can be used by 'calling/importing' them at the beginning of a script.
The ASCII function returns the decimal representation of the first character in a character string, based on its codepoint in the ASCII character set. The ASCII function takes a single argument of any character data type.
ascii()
is a function that encodes the output of repr()
to use escape sequences for any codepoint in the output produced by repr()
that is not within the ASCII range.
So a Latin 1 codepoint like ë
is represented by the Python escape sequence \xeb
instead.
This was the standard representation in Python 2; Python 3 repr()
leaves most Unicode codepoints as their actual value in the output, as long as it is a printable character:
>>> print(repr('ë'))
'ë'
>>> print(ascii('ë'))
'\xeb'
Both outputs are valid Python string literals, but the latter uses just ASCII characters, while the former requires a Unicode-compatible encoding.
For unicode codepoints between U+0100 and U+FFFF \uxxxx
escape code sequences are used, for anything over that the \Uxxxxxxxx
form is used. See the available escape code syntax for Python string literals.
Like repr()
, ascii()
is a very helpful debugging tool, especially when it comes to exact contents of a string. Unlike repr()
, the ascii()
output makes many Unicode gotchas much more visible.
Take de-normalised codepoints for example; The ë
character can be represented in two ways, as the U+00EB codepoint, or as an ASCII e
plus combining diaeresis ¨
(codepoint U+0308):
>>> import unicodedata
>>> one, two = 'ë', unicodedata.normalize('NFD', 'ë')
>>> print(one, two)
ë ë
>>> print(repr(one), repr(two))
'ë' 'ë'
>>> print(ascii(one), ascii(two))
'\xeb' 'e\u0308'
Only with ascii()
is it clear that two
consists of two distinct codepoints.
ascii()
can be useful for finding out exactly what is in a string. If a string has whitespace or unprintable characters, or if the terminal is turning the string into mojibake because of a character-encoding mismatch, it is useful to look at the ascii
representation of the string since it provides a visible and unambiguous representation for those otherwise unreadable characters which will print the same way on everyone's terminals.
There are frequent questions on Stackoverflow regarding incorrectly printed strings, and sometimes it is hard to tell what's going on because the question only shows the mojibake and not an unambiguous representation of the string. When the questioner shows the ascii
representation (or the repr
in Python 2) then the situation becomes much clearer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With