Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - The Standard Library - ascii( ) Function

I have begun to look through the Python Standard Library: (http://docs.python.org/3/library/functions.html)

In an attempt to further familiarise myself with basic python. When it comes to the explanation on the ascii( ) function, I'm not finding it clear.

Is someone able to supply a concise explanation giving examples of useful situations in which one may use the ascii( ) function please?

like image 319
Phoenix Avatar asked Feb 11 '14 12:02

Phoenix


People also ask

What is ascii function in Python?

Python ascii() Function The ascii() function returns a readable version of any object (Strings, Tuples, Lists, etc). The ascii() function will replace any non-ascii characters with escape characters: å will be replaced with \xe5 .

How do I get ascii in Python?

To get ascii value of char python, the ord () method is used. It is in-built in the library of character methods provided by Python. ASCII or American Standard Code for Information Interchange is the numeric value that is given to different characters and symbols.

What is the Python standard library?

The Python Standard Library is a collection of script modules accessible to a Python program to simplify the programming process and removing the need to rewrite commonly used commands. They can be used by 'calling/importing' them at the beginning of a script.

What is the function of ascii?

The ASCII function returns the decimal representation of the first character in a character string, based on its codepoint in the ASCII character set. The ASCII function takes a single argument of any character data type.


2 Answers

ascii() is a function that encodes the output of repr() to use escape sequences for any codepoint in the output produced by repr() that is not within the ASCII range.

So a Latin 1 codepoint like ë is represented by the Python escape sequence \xeb instead.

This was the standard representation in Python 2; Python 3 repr() leaves most Unicode codepoints as their actual value in the output, as long as it is a printable character:

>>> print(repr('ë'))
'ë'
>>> print(ascii('ë'))
'\xeb'

Both outputs are valid Python string literals, but the latter uses just ASCII characters, while the former requires a Unicode-compatible encoding.

For unicode codepoints between U+0100 and U+FFFF \uxxxx escape code sequences are used, for anything over that the \Uxxxxxxxx form is used. See the available escape code syntax for Python string literals.

Like repr(), ascii() is a very helpful debugging tool, especially when it comes to exact contents of a string. Unlike repr(), the ascii() output makes many Unicode gotchas much more visible.

Take de-normalised codepoints for example; The ë character can be represented in two ways, as the U+00EB codepoint, or as an ASCII e plus combining diaeresis ¨ (codepoint U+0308):

>>> import unicodedata
>>> one, two = 'ë', unicodedata.normalize('NFD', 'ë')
>>> print(one, two)
ë ë
>>> print(repr(one), repr(two))
'ë' 'ë'
>>> print(ascii(one), ascii(two))
'\xeb' 'e\u0308'

Only with ascii() is it clear that two consists of two distinct codepoints.

like image 134
Martijn Pieters Avatar answered Oct 01 '22 14:10

Martijn Pieters


ascii() can be useful for finding out exactly what is in a string. If a string has whitespace or unprintable characters, or if the terminal is turning the string into mojibake because of a character-encoding mismatch, it is useful to look at the ascii representation of the string since it provides a visible and unambiguous representation for those otherwise unreadable characters which will print the same way on everyone's terminals.

There are frequent questions on Stackoverflow regarding incorrectly printed strings, and sometimes it is hard to tell what's going on because the question only shows the mojibake and not an unambiguous representation of the string. When the questioner shows the ascii representation (or the repr in Python 2) then the situation becomes much clearer.

like image 34
unutbu Avatar answered Oct 01 '22 15:10

unutbu