I have a Python 2.7 program which reads iOS text messages from a SQLite database. The text messages are unicode strings. In the following text message: <pre class="prettyprint"><code>u'that\u2019s \U0001f63b' </code></pre> The apostrophe is represented by <code>\u2019</code>, but the emoji is represented by <code>\U0001f63b</code>. I looked up the code point for the emoji in question, and it's <code>\uf63b</code>. I'm not sure where the <code>0001</code> is coming from. I know comically little about character encodings. When I print the text, character by character, using: <pre class="prettyprint"><code>s = u'that\u2019s \U0001f63b' for c in s: print c.encode('unicode_escape') </code></pre> The program produces the following output: <pre class="prettyprint"><code>t h a t \u2019 s \ud83d \ude3b </code></pre> How can I correctly read these last characters in Python? Am I using encode correctly here? Should I just attempt to trash those <code>0001</code>s before reading it, or is there an easier, less silly way?

I don't think you're using encode correctly, nor do you need to. What you have is a valid unicode string with one 4 digit and one 8 digit escape sequence. Try this in the REPL on, say, OS X <pre class="prettyprint"><code>>>> s = u'that\u2019s \U0001f63b' >>> print s that’s 😻 </code></pre> In python3, though - <pre class="prettyprint"><code>Python 3.4.3 (default, Jul 7 2015, 15:40:07) >>> s = u'that\u2019s \U0001f63b' >>> s[-1] '😻' </code></pre>

Python - Reading Emoji Unicode Characters

Tags:

python

unicode

python-2.7

emoji

I have a Python 2.7 program which reads iOS text messages from a SQLite database. The text messages are unicode strings. In the following text message:

u'that\u2019s \U0001f63b'

The apostrophe is represented by \u2019, but the emoji is represented by \U0001f63b. I looked up the code point for the emoji in question, and it's \uf63b. I'm not sure where the 0001 is coming from. I know comically little about character encodings.

When I print the text, character by character, using:

s = u'that\u2019s \U0001f63b'

for c in s:
    print c.encode('unicode_escape')

The program produces the following output:

t
h
a
t
\u2019
s

\ud83d
\ude3b

How can I correctly read these last characters in Python? Am I using encode correctly here? Should I just attempt to trash those 0001s before reading it, or is there an easier, less silly way?

822

asked Jul 07 '15 22:07

Andrew LaPrise

1 Answers

I don't think you're using encode correctly, nor do you need to. What you have is a valid unicode string with one 4 digit and one 8 digit escape sequence. Try this in the REPL on, say, OS X

>>> s = u'that\u2019s \U0001f63b'
>>> print s
that’s 😻

In python3, though -

Python 3.4.3 (default, Jul  7 2015, 15:40:07) 
>>> s  = u'that\u2019s \U0001f63b'
>>> s[-1]
'😻'

answered Oct 31 '22 09:10

pvg

Related questions
                            
                                Querying from list of related in SQLalchemy and Flask
                            
                                converting text file to html file with python
                            
                                Python uuid4, How to limit the length of Unique chars
                            
                                Auto restart django development server on file save after previous error
                            
                                Matplotlib ignoring timezone
                            
                                How to know if object is of str or list or dict or int?
                            
                                Creating an empty deque in Python with a max length?
                            
                                Polling a stopping or starting EC2 instance with Boto
                            
                                Output 50 samples closest to each cluster center using scikit-learn.k-means library
                            
                                What is the meaning of string argument in django model's Field?
                            
                                Django Rest Framework 3.0 to_representation not implemented
                            
                                Python3.4 can't install mysql-python
                            
                                Get the id of the object recently created Django Rest Framework
                            
                                TypeError: Type str doesn't support the buffer API when splitting string
                            
                                How do I create a pie chart using Bokeh?
                            
                                Selenium/PhantomJS raises error
                            
                                Error importing Polygon from shapely.geometry.polygon
                            
                                How to get test cases list in Robot Framework without launching the actual tests?
                            
                                Extracting a dictionary from an RDD in Pyspark
                            
                                How to enable CORS on Google App Engine Python Server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With