Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Character and CodePoint in Swift

Tags:

string

swift

Can I convert directly between a Swift Character and its Unicode numeric value? That is:

var i:Int = ...  // A plain integer index.
var myCodeUnit:UInt16 = myString.utf16[i]
// Would like to say myChar = myCodeUnit as Character, or equivalent.

or...

var j:String.Index = ... // NOT an integer!
var myChar:Character = myString[j]
// Would like to say myCodeUnit = myChar as UInt16

I can say:

myCodeUnit = String(myChar).utf16[0]

but this means creating a new String for each character. And I am doing this thousands of times (parsing text) so that is a lot of new Strings that are immediately being discarded.

like image 651
Andrew Duncan Avatar asked Jan 11 '23 11:01

Andrew Duncan


2 Answers

The type Character represents a "Unicode grapheme cluster", which can be multiple Unicode codepoints. If you want one Unicode codepoint, you should use the type UnicodeScalar instead.

like image 100
newacct Avatar answered Jan 16 '23 20:01

newacct


As per the swift book:

String to Code Unit

To get codeunit/ordinals for each character of the String, you can do the following:

var yourSwiftString = "甲乙丙丁"
for scalar in yourSwiftString.unicodeScalars {
    print("\(scalar.value) ")
}

Code Unit to String

Because swift current does not have a way to convert ordinals/code units back to UTF, the best way I found is to still NSString. i.e. if you have int ordinals (32bit but representing the 21bit codepoints) you can use the following to convert to Unicode:

var i = 22247
var unicode_str = NSString(bytes: &i, length: 4, encoding: NSUTF32LittleEndianStringEncoding)

Obviously if you want to convert a array of ints, you'll need to pack them into a array first.

like image 23
Daniel Chu Avatar answered Jan 16 '23 22:01

Daniel Chu