How can I convert Extended ASCII characters such as: "æ, ö or ç" into non-extended ASCII characters (a,o,c) using python? The way it works should be that if it takes "A, Æ ,Ä" as input, It returns A for all of them.
On a standard 101 keyboard, special extended ASCII characters such as é or ß can be typed by holding the ALT key and typing the corresponding 4 digit ASCII code. For example é is typed by holding the ALT key and typing 0233 on the keypad.
Here are few methods in different programming languages to print ASCII value of a given character : Python code using ord function : ord() : It converts the given string of length one, returns an integer representing the Unicode code point of the character. For example, ord('a') returns the integer 97.
UTF-8 extends the ASCII character set to use 8-bit code points, which allows for up to 256 different characters. This means that UTF-8 can represent all of the printable ASCII characters, as well as the non-printable characters.
Extended-ASCII, with numeric code points between 128 to 255 decimal (80 to FF hexadecimal, 1000 0000 to 1111 1111 binary), collides with UTF-8 because it has the leftmost bit set to one, and this tells the interpreter that one (at least one) additional byte is required to form the character.
Unidecode might be of use to you.
Python 3.2.3 (default, Jun 8 2012, 05:36:09)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from unidecode import unidecode
>>> unidecode("æ, ö or ç")
'ae, o or c'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With