Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to express Strings in Swift using Unicode hexadecimal values (UTF-16)

Tags:

I want to write a Unicode string using hexadecimal values in Swift. I have read the documentation for String and Character so I know that I can use special Unicode characters directly in strings like the following:

var variableString = "Cat‼🐱" // "Cat" + Double Exclamation + cat emoji

But I would like to do it using the Unicode code points. The docs (and this question) show it for characters, but are not very clear about how to do it for strings.

(Note: Although the answer seems obvious to me now, it wasn't obvious at all just a short time ago. I am answering my own question below as a means of learning how to do this and also to help myself understand Unicode terminology and how Swift Characters and Strings work.)

like image 485
Suragch Avatar asked Jul 08 '15 05:07

Suragch


People also ask

What is Unicode in Swift?

The Unicode. Scalar type, representing a single Unicode scalar value, is the element type of a string's unicodeScalars collection. You can create a Unicode. Scalar instance by using a string literal that contains a single character representing exactly one Unicode scalar value.

What is Unicode string type?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.

Is Hex a Unicode?

Unicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310.


2 Answers

from your Hex "0x1F52D" to actual Emoji

let c = 0x1F602

next step would possibly getting an Uint32 from your Hex

let intEmoji = UnicodeScalar(c!).value

from this you can do something like

titleLabel.text = String(UnicodeScalar(intEmoji)!)

here you have a "😂"

it work with range of hexadecimal too

let emojiRanges = [
            0x1F600...0x1F636,
            0x1F645...0x1F64F,
            0x1F910...0x1F91F,
            0x1F30D...0x1F52D
        ]

        for range in emojiRanges {
            for i in range {
                let c = UnicodeScalar(i)!.value
                data.append(c)
            }
        }

to get multiple UInt32 from your Hex range for exemple

like image 29
Mehdi S. Avatar answered Oct 02 '22 13:10

Mehdi S.


Updated for Swift 3

Character

The Swift syntax for forming a hexadecimal code point is

\u{n}

where n is a hexadecimal number up to 8 digits long. The valid range for a Unicode scalar is U+0 to U+D7FF and U+E000 to U+10FFFF inclusive. (The U+D800 to U+DFFF range is for surrogate pairs, which are not scalars themselves, but are used in UTF-16 for encoding the higher value scalars.)

Examples:

// The following forms are equivalent. They all produce "C". 
let char1: Character = "\u{43}"
let char2: Character = "\u{0043}"
let char3: Character = "\u{00000043}"

// Higher value Unicode scalars are done similarly
let char4: Character = "\u{203C}" // ‼ (DOUBLE EXCLAMATION MARK character)
let char5: Character = "\u{1F431}" // 🐱 (cat emoji)

// Characters can be made up of multiple scalars
let char7: Character = "\u{65}\u{301}" // é = "e" + accent mark
let char8: Character = "\u{65}\u{301}\u{20DD}" // é⃝ = "e" + accent mark + circle

Notes:

  • Leading zeros can be added or omitted
  • Characters are known as extended grapheme clusters. Even when they are composed of multiple scalars, they are still considered a single character. What is key is that they appear to be a single character (grapheme) to the user.
  • TODO: How to convert surrogate pair to Unicode scalar in Swift

String

Strings are composed of characters. See the following examples for some ways to form them using hexadecimal code points.

Examples:

var string1 = "\u{0043}\u{0061}\u{0074}\u{203C}\u{1F431}" // Cat‼🐱

// pass an array of characters to a String initializer
let catCharacters: [Character] = ["\u{0043}", "\u{0061}", "\u{0074}", "\u{203C}", "\u{1F431}"] // ["C", "a", "t", "‼", "🐱"]
let string2 = String(catCharacters) // Cat‼🐱

Converting Hex Values at Runtime

At runtime you can convert hexadecimal or Int values into a Character or String by first converting it to a UnicodeScalar.

Examples:

// hex values
let value0: UInt8  = 0x43     // 97
let value1: UInt16 = 0x203C   // 22823
let value2: UInt32 = 0x1F431  // 127822

// convert hex to UnicodeScalar
let scalar0 = UnicodeScalar(value0)
// make sure that UInt16 and UInt32 form valid Unicode values
guard
    let scalar1 = UnicodeScalar(value1),
    let scalar2 = UnicodeScalar(value2) else {
    return
}

// convert to Character
let character0 = Character(scalar0) // C
let character1 = Character(scalar1) // ‼
let character2 = Character(scalar2) // 🐱

// convert to String
let string0 = String(scalar0) // C
let string1 = String(scalar1) // ‼
let string2 = String(scalar2) // 🐱

// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
    if let scalar = UnicodeScalar(hexValue) {
        myString.append(Character(scalar))
    }
}
print(myString) // Cat‼🐱

Further reading

  • Strings and Characters docs
  • Glossary of Unicode Terms
  • Strings in Swift
  • Working with Unicode code points in Swift
like image 115
Suragch Avatar answered Oct 02 '22 14:10

Suragch