The following example is taken from the Strings and Characters documentation:
The values 55357
(U+D83D
in hex) and 56374
(U+DC36
in hex) are the surrogate pairs that form the Unicode scalar U+1F436
, which is the DOG FACE
character. Is there any way to go the other direction? That is, can I convert a surrogate pair into a scalar?
I tried
let myChar: Character = "\u{D83D}\u{DC36}"
but I got an "Invalid Unicode scalar" error.
This Objective C answer and this project seem to be custom solutions, but is there anything built into Swift (especially Swift 2.0+) that does this?
There are formulas to calculate the original code point based on a surrogate pair and vice versa. From https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae:
Section 3.7 of The Unicode Standard 3.0 defines the algorithms for converting to and from surrogate pairs.
A code point
C
greater than0xFFFF
corresponds to a surrogate pair<H, L>
as per the following formula:H = Math.floor((C - 0x10000) / 0x400) + 0xD800 L = (C - 0x10000) % 0x400 + 0xDC00
The reverse mapping, i.e. from a surrogate pair
<H, L>
to a Unicode code pointC
, is given by:C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000
Given an sequence of UTF-16 code units (i.e. 16-bit numbers, such as you get from String.utf16
or just an array of numbers), you can use the UTF16
type and its decode
method to turn it into UnicodeScalars
, which you can then convert into a String
.
Itβs a bit of a grungy item, that takes a generator (as it does stateful processing) and returns an enum that indicates a result (with an associated type of the scalar), or an error or completion. Swift 2.0 pattern matching makes it a lot easier to use:
let u16data: [UInt16] = [0xD83D,0xDC36]
//or let u16data = "Hello, π".utf16
var g = u16data.generate()
var s: String = ""
var utf16 = UTF16()
while case let .Result(scalar) = utf16.decode(&g) {
print(scalar, &s)
}
print(s) // prints πΆ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With