Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert surrogate pair to Unicode scalar in Swift

The following example is taken from the Strings and Characters documentation:

enter image description here

The values 55357 (U+D83D in hex) and 56374 (U+DC36 in hex) are the surrogate pairs that form the Unicode scalar U+1F436, which is the DOG FACE character. Is there any way to go the other direction? That is, can I convert a surrogate pair into a scalar?

I tried

let myChar: Character = "\u{D83D}\u{DC36}"

but I got an "Invalid Unicode scalar" error.

This Objective C answer and this project seem to be custom solutions, but is there anything built into Swift (especially Swift 2.0+) that does this?

like image 347
Suragch Avatar asked Jul 08 '15 02:07

Suragch


2 Answers

There are formulas to calculate the original code point based on a surrogate pair and vice versa. From https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae:

Section 3.7 of The Unicode Standard 3.0 defines the algorithms for converting to and from surrogate pairs.

A code point C greater than 0xFFFF corresponds to a surrogate pair <H, L> as per the following formula:

H = Math.floor((C - 0x10000) / 0x400) + 0xD800
L = (C - 0x10000) % 0x400 + 0xDC00

The reverse mapping, i.e. from a surrogate pair <H, L> to a Unicode code point C, is given by:

C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000
like image 125
Mathias Bynens Avatar answered Oct 29 '22 17:10

Mathias Bynens


Given an sequence of UTF-16 code units (i.e. 16-bit numbers, such as you get from String.utf16 or just an array of numbers), you can use the UTF16 type and its decode method to turn it into UnicodeScalars, which you can then convert into a String.

It’s a bit of a grungy item, that takes a generator (as it does stateful processing) and returns an enum that indicates a result (with an associated type of the scalar), or an error or completion. Swift 2.0 pattern matching makes it a lot easier to use:

let u16data: [UInt16] = [0xD83D,0xDC36]
//or let u16data = "Hello, 🌍".utf16

var g = u16data.generate()
var s: String = ""
var utf16 = UTF16()
while case let .Result(scalar) = utf16.decode(&g) {
    print(scalar, &s)
}
print(s) // prints 🐢
like image 3
Airspeed Velocity Avatar answered Oct 29 '22 17:10

Airspeed Velocity