Why does emoji have two different utf-8 codes? How to convert emoji from utf-8 , use NSString in ios?

Tags:

We have found an issue, that some emoji have two utf-8 codes, such as:

emoji   unicode    utf-8                another utf-8
😁      U+1F601    \xf0\x9f\x98\x81     \xed\xa0\xbd\xed\xb8\x81

But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.

ios code

In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.

Documents i referenced includes:

emoji code link

whole utf-8 code link

But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.

input code

ouput

So, my question is :

Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?

676

asked Dec 22 '15 05:12

pinchwang

1 Answers

0xF0, 0x9F, 0x98, 0x81

Is the correct UTF-8 encoding for U+1F601 😁.

0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81

Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.

This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.

This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving \uD83D\xDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.

(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)

You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.

140

answered Sep 29 '22 13:09

bobince

Related questions
                            
                                Large UICollectionViewCell stopped being displayed when scrolling
                            
                                Send notification to all the devices connected to a Wi-Fi network
                            
                                Detect when WKWebView has finished loading EVERY time
                            
                                When writing to text file (appending) the last value I write replaces all previous values
                            
                                Protocol extension vs class extension in Swift
                            
                                In-App Purchases still Waiting for Review
                            
                                How to maintain login status in a PWA initially loaded via Safari 14/iOS 14?
                            
                                Counterpart to -application:openURL:sourceApplication:annotation:
                            
                                HTTP live streaming with encryption
                            
                                How to fully automate from build to .ipa file in xcode 4
                            
                                Why is hitTest:withEvent: called three times for each touch?
                            
                                Put delegate methods into a category
                            
                                iOS iTunes store country
                            
                                Scaling font size to fit vertically in UILabel
                            
                                Is NSURLCache persistent across launches?
                            
                                How to set UITabBarItem's unselected tint, ***including system items*** (iOS7)
                            
                                iOS: Rounded rectangle with border bleeds color
                            
                                Core Data info & error in console at runtime
                            
                                Xcode turn off auto layout related warning in console outputs
                            
                                What do "Use standard value” and “Constrain to Margins" mean in Auto Layout?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does emoji have two different utf-8 codes? How to convert emoji from utf-8 , use NSString in ios?

Tags:

ios

unicode

utf-8

nsstring

emoji

pinchwang

People also ask

1 Answers

bobince

Recent Activity

Donate For Us