str.encode() giving unexpected results

Tags:

character-encoding

I've been playing around with python built-ins and have gotten some confusing (for me) results.

Take a look at this code:

>>> 'ü'.encode()
b'\xc3\xbc'

Why was \xc3\xbc (195 and 188 in decimal) returned? If you look at the ascii table, we see that ü is the 129'th character. Or if you take a look here, we see that ü is the 252'nd Unicode character, which is what you get from

>>> ord('ü')
252

So where is the \xc3\xbc coming from and why is it split up into two bytes? and when you decode: b'\xc3\xbc'.decode(), how does it know that these two bytes are for one character?

730

asked Apr 25 '21 01:04

Have a nice day

Video Answer

1 Answers

On the table you're looking at, you're looking at the section titled "Extended ASCII", more commonly known at ISO/IEC 8859, or latin1. ASCII, as a character set, defines 7-bit characters from 0 to 127. latin1 defines the other 128 single-byte characters and is an extension of ASCII. Python uses UTF-8, which extends ASCII (and hence is compatible with it) but is incompatible with latin1.

The character ü is has Unicode codepoint 0xFC (252 in decimal) and, when using UTF-8, is encoded using two characters.

Lots of online ASCII tables get this wrong. It's inaccurate to call the code points 128 up to 255 ASCII characters, because ASCII doesn't claim to assign any value to those code points.

146

answered Sep 27 '22 21:09

Silvio Mayolo

Related questions
                            
                                Comparison of np.random.choice vs np.random.shuffle for samples without replacement
                            
                                How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased') work??
                            
                                How can I check if a Python collection is ordered?
                            
                                How to config 'Completer.use_jedi' to 'False' in Juypter Notebook permanently
                            
                                How to Deal with Lat/Lon Arrays with Multiple Dimensions?
                            
                                Preform aggregation(s) on multiindex columns
                            
                                Cannot call Python function from Javascript in Notebook
                            
                                Same random numbers in C++ as computed by Python3 numpy.random.rand
                            
                                Writing data from a Python List and a Dictionary to CSV
                            
                                How to implement Grad-CAM on a trained network
                            
                                Poetry could not find a pyproject.toml file in C:\
                            
                                How to serialise and deserialise complex POCO data structures in Python to/from JSON
                            
                                The wikipedia api seems to almost always get the word in question wrong
                            
                                Automatically simplify redundant arithmetic relations
                            
                                lask.cli.NoAppException: While importing "app", an ImportError was raised:
                            
                                Color percentage in image for Python using OpenCV
                            
                                Getting 403 when using Selenium to automate checkout process
                            
                                ImportError: Spatial indexes require either `rtree` or `pygeos` in geopanda but rtree is installed
                            
                                Pandas sort_value() issue. Wrong sorting integer when applied key parameter
                            
                                Scraping data from a dynamic web table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With