How to get a character from its UTF-16 code points in Python 3?

Tags:

I have a list of UTF-16 code points that I need to convert to the actual characters they represent programmatically. This seems unbelievably hard to do in Python 3.

For example, I have the numbers 55357 and 56501 for one character, which I know is this banknote emoji: 💵 But I have no idea how to convert that in Python. I first tried chr(55357) + chr(56501), but Python seems to assume that it is UTF-8 encoded and thus gives me broken Unicode.

I then tried re-encoding the string, but since it's broken UTF-8, it gives me what seems to be broken UTF-16. If I tell it to leave it alone with (chr(55357) + chr(56501)).encode('utf-8', 'surrogatepass'), I can actually get a valid bytes of the character, but it's encoded in...CESU-8, for reasons I cannot yet grasp. This is not an encoding Python supports natively, and I can't find a codec to convert it.

I think I could probably write these to the disk and then read them with the right encoding, but that sounds really terrible.

Is there a reasonable way to do this in Python 3?

768

asked Feb 12 '19 06:02

Ullallulloo

Video Answer

1 Answers

The trick is not to mess with chr but rather to convert to a byte array, which you can then decode into a string:

a, b = 55357, 56501
x = a.to_bytes(2, 'little') + b.to_bytes(2, 'little')

print(x.decode('UTF-16'))

This can be generalized for any number of integers:

data = [55357, 56501]
b = bytes([x for c in data for x in c.to_bytes(2, 'little')])
result = b.decode('utf-16')

The reason something like chr(55357) + chr(56501) doesn't work is that chr assumes no encoding. It works on the raw Unicode code points, so you are combining two distinct characters. As the other answer points out, you then have to encode this two character string and re-decode it, or just get the bytes and decode once as I'm suggesting.

117

answered Sep 17 '22 23:09

Mad Physicist

Related questions
                            
                                K.<v> notation in Python 2
                            
                                Selenium with Firefox webdriver results in error: Service geckodriver unexpectedly exited. Status code was: -11
                            
                                How to replace special characters within a text with a space in Python?
                            
                                Get value of nested attribute by filtering list on other attribute with Python Glom
                            
                                How to resample text (imbalanced groups) in a pipeline?
                            
                                What does axis=[1,2,3] mean in K.sum in keras backend?
                            
                                How to use bearer authentication in openapi-codegen generated python code
                            
                                How to set same colors for same indexes in different charts in matplotlib and seaborn
                            
                                Difference between add_form and form
                            
                                What is numpy.mgrid, technically?
                            
                                OpenCV 4 TypeError: Expected cv::UMat for argument 'labels'
                            
                                Python subprocess.call with timeout retry
                            
                                Parallel Sklearn Model Building with Dask or Joblib
                            
                                How to remove last N lines from txt file with Python?
                            
                                prevent flask reload on change
                            
                                Duplicating previous day rows for all missing dates dataframe
                            
                                Inplementation of LSTM in Keras
                            
                                How does one read the function signatures from Python's official documentation
                            
                                Sharing objects across workers using pyarrow
                            
                                Can python functions defined inside class methods access self?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get a character from its UTF-16 code points in Python 3?

Tags:

python

python-3.x

utf-16

Ullallulloo

People also ask

Video Answer

1 Answers

Mad Physicist

Recent Activity

Donate For Us