Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting to ASCII with numbers above 128

Simple question, yet should be simple answer ^_^ Looked around and found nothing. I'm using python 3.4 and can convert numbers up to 128 with the

print (chr(int))

method without trouble. 104 gives me "h", and 73 gives me "I". However, when I use numbers higher than 128, it gives me the wrong thing. I think it's converting to unicode or something like that? I.e. 193 gives me Á instead of the "bottom" sign (upside down T).

like image 743
w1nter Avatar asked Jun 02 '15 13:06

w1nter


People also ask

What is the ASCII value of 128?

2) While keep press "Alt", on your keyboard type the number "128", which is the number of the letter or symbol "Ç" in ASCII table.

Why is ASCII limited to 128 characters?

The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.

What is the ASCII value of 127?

The delete control character (also called DEL or rubout) is the last character in the ASCII repertoire, with the code 127. It is supposed to do nothing and was designed to erase incorrect characters on paper tape.


1 Answers

All text in Python 3 is Unicode. ASCII just happens to be a subset of the Unicode standard.

So chr(codepoint) always converts to a Unicode character, where the first 128 codepoints also conform to the ASCII standard.

I'm not sure what you were expecting for values > 127 as the ASCII standard only contains 128 codepoints. Most codecs in use today are extensions of the ASCII standard; if you expected a specific codec, you need to use bytes and decode from that codec; to use the Windows 1252 codepage you could use:

>>> bytes([128]).decode('cp1252')
'€'

as that codepage defines codepoint 128 as the Euro sign, while the Unicode standard puts that (hex) U+20AC.

I think you were looking for Codepage 437 here; a codepage that includes box drawing characters; 193 indeed is an inverted T in that codepage:

>>> bytes([193]).decode('cp437')
'┴'

That's U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL in the Unicode standard. To be absolutely clear: codepoints past 127 exist in codepage 437 but are not ASCII.

You may want to read up on Unicode and Python in this context:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • Pragmatic Unicode by Ned Batchelder

  • The Python Unicode HOWTO

like image 131
Martijn Pieters Avatar answered Sep 18 '22 07:09

Martijn Pieters