Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does python print() function actually do?

I was looking at this question and started wondering what does the print actually do.

I have never found out how to use string.decode() and string.encode() to get an unicode string "out" in the python interactive shell in the same format as the print does. No matter what I do, I get either

  1. UnicodeEncodeError or
  2. the escaped string with "\x##" notation...

This is python 2.x, but I'm already trying to mend my ways and actually call print() :)

Example:

>>> import sys
>>> a = '\xAA\xBB\xCC'
>>> print(a)
ª»Ì
>>> a.encode(sys.stdout.encoding)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0: ordinal not in range(128)
>>> a.decode(sys.stdout.encoding)
u'\xaa\xbb\xcc'

EDIT:

Why am I asking this? I am sick and tired of encode() errors and realized that since print can do it (at least in the interactive shell). I know that the MUST BE A WAY to magically do the encoding PROPERLY, by digging the info what encoding to use from somewhere...

ADDITIONAL INFO: I'm running Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2

>>> sys.stdin.encoding
'ISO-8859-1'
>>> sys.stdout.encoding
'ISO-8859-1'

However, the results are the same with Python 2.6.2 (r262:71600, Sep 8 2009, 13:06:43) on the same linux box.

like image 994
Kimvais Avatar asked Dec 30 '09 09:12

Kimvais


People also ask

What does the Python print () function do?

The Python print() function takes in python data such as ints and strings, and prints those values to standard out. To say that standard out is "text" here means a series of lines, where each line is a series of chars with a '\n' newline char marking the end of each line.

What do we use print () statement for?

Use the PRINT statement to send data to the screen, a line printer, or another print file. The ON clause specifies the logical print channel to use for output.

What arguments does print () function expect?

The print function parameters are: *objects – an object/objects to be printed. sep – the separator between multiple printed objects. end – the character/string printed at the end after the object.

Why do we use print () and input () functions?

The input ( ) function helps to enter data at run time by the user and the output function print ( ) is used to display the result of the program on the screen after execution.


1 Answers

EDIT: (Major changes between this edit and the previous one... Note: I'm using Python 2.6.4 on an Ubuntu box.)

Firstly, in my first attempt at an answer, I provided some general information on print and str which I'm going to leave below for the benefit of anybody having simpler issues with print and chancing upon this question. As for a new attempt at dealing with the issue experienced by the OP... Basically, I'm inclined to say that there's no silver bullet here and if print somehow manages to make sense of a weird string literal, then that's not reproducible behaviour. I'm led to this conclusion by the following funny interaction with Python in my terminal window:

>>> print '\xaa\xbb\xcc'
��

Have you tried to input ª»Ì directly from the terminal? At a Linux terminal using utf-8 as the encoding, this is actually read in as six bytes, which can then be made to look like three unicode chars with the help of the decode method:

>>> 'ª»Ì'
'\xc2\xaa\xc2\xbb\xc3\x8c'
>>> 'ª»Ì'.decode(sys.stdin.encoding)
u'\xaa\xbb\xcc'

So, the '\xaa\xbb\xcc' literal only makes sense if you decode it as a latin-1 literal (well, actually you could use a different encoding which agrees with latin-1 on the relevant characters). As for print 'just working' in your case, it certainly doesn't for me -- as mentioned above.

This is explained by the fact that when you use a string literal not prefixed with a u -- i.e. "asdf" rather than u"asdf" -- the resulting string will use some non-unicode encoding. No; as a matter of fact, the string object itself is going to be encoding-unaware, and you're going to have to treat it as if it was encoded with encoding x, for the correct value of x. This basic idea leads me to the following:

a = '\xAA\xBB\xCC'
a.decode('latin1')
# result: u'\xAA\xBB\xCC'
print(a.decode('latin1'))
# output: ª»Ì

Note the lack of decoding errors and the proper output (which I expect to be stay proper at any other box). Apparently your string literal can be made sense of by Python, but not without some help.

Does this help? (At least in understanding how things work, if not in making the handling of encodings any easier...)


Now for some funny bits with some explanatory value (hopefully)! This works fine for me:

sys.stdout.write("\xAA\xBB\xCC".decode('latin1').encode(sys.stdout.encoding))

Skipping either the decode or the encode part results in a unicode-related exception. Theoretically speaking, this makes sense, as the first decode is needed to decide what characters there are in the given string (the only thing obvious on first sight is what bytes there are -- the Python 3 idea of having (unicode) strings for characters and bytes for, well, bytes, suddenly seems superbly reasonable), while the encode is needed so that the output respects the encoding of the output stream. Now this

sys.stdout.write("ąöî\n".decode(sys.stdin.encoding).encode(sys.stdout.encoding))

also works as expected, but the characters are actually coming from the keyboard and so are actually encoded with the stdin encoding... Also,

ord('ą'.decode('utf-8').encode('latin2'))

returns the correct 177 (my input encoding is utf-8), but '\xc4\x85'.encode('latin2') makes no sense to Python, as it has no clue as to how to make sense of '\xc4\x85' and figures that trying the 'ascii' code is the best it can do.


The original answer:

The relevant bit of Python docs (for version 2.6.4) says that print(obj) is meant to print out the string given by str(obj). I suppose you could then wrap it in a call to unicode (as in unicode(str(obj))) to get a unicode string out -- or you could just use Python 3 and exchange this particular nuisance for a couple of different ones. ;-)

Incidentally, this shows that you can manipulate the result of printing an object just like you can manipulate the result of calling str on an object, that is by messing with the __str__ method. Example:

class Foo(object):
    def __str__(self):
        return "I'm a Foo!"

print Foo()

As for the actual implementation of print, I expect this won't be useful at all, but if you really want to know what's going on... It's in the file Python/bltinmodule.c in the Python sources (I'm looking at version 2.6.4). Search for a line beginning with builtin_print. It's actually entirely straightforward, no magic going on there. :-)

Hopefully this answers your question... But if you do have a more arcane problem which I'm missing entirely, do comment, I'll make a second attempt. Also, I'm assuming we're dealing with Python 2.x; otherwise I guess I wouldn't have a useful comment.

like image 162
Michał Marczyk Avatar answered Sep 20 '22 12:09

Michał Marczyk