Python3 adds extra byte when printing hex values

Question

I have run into a strange difference between Python2 and Python3. Printing the same list of characters yields an extra byte C2 when printed with Python3. I would have expected the same behaviour. Python2 behaves as I expected. What am I missing here?

$ python3 -c "print('\x30\xA0\x04\x08')" | xxd
0000000: 30c2 a004 080a     
$ python2 -c "print('\x30\xA0\x04\x08')" | xxd
0000000: 30a0 0408 0a

interjay · Accepted Answer

Python 3 strings are unicode, and on your platform unicode is printed using UTF-8 encoding. The UTF-8 encoding for unicode character U+00A0 is 0xC2 0xA0, which is what you see.

Python 2 strings are bytestrings, so they are output exactly.

warvariuc · Answer

In Python 3 all string literals are unicode.

\A0 converted to UTF-8 is a no-break space:

U+00A0 no-break space (HTML &#160; ·  ) Can be encoded in UTF-8 as C2 A0

Try this:

$ python3 -c "import sys; sys.stdout.buffer.write(b'\x30\xA0\x04\x08')" | xxd
0000000: 30a0 0408                                0...

Python3 adds extra byte when printing hex values

Tags:

python

python-3.x

python-2.x

Kai

2 Answers

interjay

warvariuc

Recent Activity

Donate For Us

Python3 adds extra byte when printing hex values

Tags:

python

python-3.x

python-2.x

Kai

2 Answers

interjay

warvariuc

Related questions

Recent Activity

Donate For Us