When searching the text Çınaraltı Café
for the text Ci
using the code
NSStringCompareOptions options =
NSCaseInsensitiveSearch |
NSDiacriticInsensitiveSearch |
NSWidthInsensitiveSearch;
NSLocale *locale = [NSLocale localeWithLocaleIdentifier:@"tr"];
NSRange range = [haystack rangeOfString:needle
options:options
range:NSMakeRange(o, haystack.length)
locale:locale];
I get range.location
equals NSNotFound
.
It's not to do with the diacritic on the initial Ç because I get the same result searching for alti
where the only odd character is the ı. I also get a valid match searching for Cafe
which contains a diacritic (the é).
The apple docs mention this situation as notes on the locale
parameter and I think I'm following them. Though I guess I'm not because it's not working.
How can I get a search for 'i' to match both 'i' and 'ı'?
I, or ı, called dotless I, is a letter used in the Latin-script alphabets of Azerbaijani, Crimean Tatar, Gagauz, Kazakh, Tatar, and Turkish.
dotless i (plural dotless is) A letter whose uppercase version is "I" and lowercase version is "ı". A letter "I"/"i" without the lowercase dot, that is used in the Turkish language.
Turkish has two of them, one with and one without a dot and they represent two rather different vowels. Turkish spelling is very consistent in that. It means e.g. that a word with an initial “i” will also be capitalized as İ, while and initial I without dot is a capitalized ı.
You can also input a dotless I with the keyboard shortcut Shift + Option + B or Shift + Alt + B.
I don't know whether this helps as an answer, but perhaps explains why it's happening.
I should point out I'm not an expert in this matter, but I've been looking into this for my own purposes and been doing some research.
Looking at the Unicode collation chart for latin, the equivalent characters to ASCII "i" (\u0069)
do not include "ı" (\u0131)
, whereas all the other letters in your example string are as you expect, i.e.:
"c" (\u0063)
does include "Ç" (\u00c7)
"e" (\u0065)
does include "é" (\u00e9)
The ı
character is listed separately as being of primary difference to i
. That might not make sense to a Turkish speaker (I'm not one) but it's what Unicode have to say about it, and it does fit the logic of the problem you describe.
In Chrome you can see this in action with an in-page search. Searching in the page for ASCII i
highlights all the characters in its block and does not match ı
. Searching for ı
does the opposite.
By contrast, MySQL's utf8_general_ci collation table maps uppercase ASCII I
to ı
as you want.
So, without knowing anything about iOS, I'm assuming it's using the Unicode standard and normalising all characters to latin by this table.
As to how you match Çınaraltı
with Ci
- if you can't override the collation table then perhaps you can just replace i
in your search strings with a regular expression, so you search on Ç[iı]
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With