Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a Korean word into it's components?

Tags:

swift

unicode

So, for example the character 김 is made up of ㄱ, ㅣ and ㅁ. I need to split the Korean word into it's components to get the resulting 3 characters.

I tried by doing the following but it doesn't seem to output it correctly:

let str = "김"
let utf8 = str.utf8
let first:UInt8 = utf8.first!
let char = Character(UnicodeScalar(first))

The problem is, that that code returns ê, when it should be returning ㄱ.

like image 516
eskimo Avatar asked Aug 31 '25 17:08

eskimo


1 Answers

You need to use the decomposedStringWithCompatibilityMapping string to get the unicode scalar values and then use those scalar values to get the characters. Something below,

let string = "김"
for scalar in string.decomposedStringWithCompatibilityMapping.unicodeScalars {
  print("\(scalar) ")
}

Output:

ᄀ 
ᅵ 
ᆷ 

You can create list of character strings as,

let chars = string.decomposedStringWithCompatibilityMapping.unicodeScalars.map { String($0) }
print(chars)
// ["ᄀ", "ᅵ", "ᆷ"]

Korean related info in Apple docs

Extended grapheme clusters are a flexible way to represent many complex script characters as a single Character value. For example, Hangul syllables from the Korean alphabet can be represented as either a precomposed or decomposed sequence. Both of these representations qualify as a single Character value in Swift:

let precomposed: Character = "\u{D55C}"                  // 한
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}"   // ᄒ, ᅡ, ᆫ
// precomposed is 한, decomposed is 한
like image 80
Kamran Avatar answered Sep 03 '25 19:09

Kamran