Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search or compare within a Grapheme Cluster in Korean

In my current implementation of a UISearchBarController I'm using [NSString compare:] inside the filterContentForSearchText:scope: delegate method to return relevant objects based on their name property to the results UITableView as you start typing.

So far this works great in English and Korean, but what I'd like to be able to do is search within NSString's defined character clusters. This is only applicable for a handfull of languages, of which Korean is one.

In English, compare: returns new results after every letter you enter, but in Korean the results are generated once you complete a recognized grapheme cluster. I would like to be able to search through my Korean objects name property via the individual elements that make up a syllable.

Can anyone shed any light on how to approach this? I'm sure it has something to do with searching through UTF16 characters manually, or by utilising a lower level class.

Cheers!

Here is a specific example that's just not working:

`NSString *string1 = @"이"; 
`NSString *string2 = @"ㅣ";
NSRange resultRange = [[string1 decomposedStringWithCanonicalMapping] rangeOfString:    [string2 decomposedStringWithCanonicalMapping] options:(NSLiteralSearch)];

The result is always NSNotFound, with or without decomposedStringWithCanonicalMapping.

Any ideas?

like image 595
Jessedc Avatar asked Jan 21 '10 04:01

Jessedc


2 Answers

I'm no expert, but I think you're very unlikely to find a clean solution for what you want. There doesn't seem to be any relationship between a Korean character's Unicode value and the graphemes that it's made up of.

e.g. "이" is \uc774 and "ㅣ" is \u3163. From the perspective of the NSString, they're just two different characters with no specific relationship to each other.

I suspect that you will have to find or create an explicit mapping between characters and their graphemes, and then write your own search function that consults this mapping.

This very long page on Unicode Korean can help you, if it comes to that. It has a table of all the characters which suggests some structured relation between the way characters are numbered and their components.

like image 114
lawrence Avatar answered Oct 08 '22 05:10

lawrence


If you use compare:options with NSLiteralString, it should compare character by character, that is, the Unicode code points, regardless of the grapheme. The default behavior of compare: is to use no options. You could use - decomposedStringWithCanonicalMapping to get the Unicode bytes of the input string, but I'm not sure how that would interact with compare:.

like image 25
Don Avatar answered Oct 08 '22 05:10

Don