When using IndexOf
to find a char which is followed by a large valued char (e.g. char 700 which is ʼ) then the IndexOf
fails to recognize the char you are looking for.
e.g.
string find = "abcʼabcabc";
int index = find.IndexOf("c");
In this code, index should be 2, but it returns 6.
Is there a way to get around this?
Unicode letter 700 is a modifier apostrophe: in other words, it modifies the letter c. In the same way, if you were to use an 'e' followed by character 769 (0x301), it would not really be an 'e' anymore: the e has been modified to be e with an acute accent. To wit: é. You'll see that letter is actually two characters: copy it to notepad and hit backspace (neat, huh?).
You need to do an "Ordinal" comparison (byte-by-byte) without any linguistic comparison. That will find the 'c', and ignore the linguistic fact that it is modified by the next letter. In my 'e' example, the bytes are (65)(769), so if you go byte-by-byte looking for 65, you will find it, and that ignores the fact that (65)(769) is linguistically the same as (233): é. If you search for (233) linguistically it will find the "equivalent" (65)(769):
string find = "abéabcabc";
int index = find.IndexOf("é"); //gives you '2' even though the "find" has two characters and the the "indexof" is one
Hopefully that's not too confusing. If you're doing this in real code you should explain in comments exactly what you're doing: as in my 'e' example generally you would want to do semantic equivalence for user data, and ordinal equivalence for e.g. constants (which hopefully wouldn't be different like this, lest your successor hunt you down with an axe).
The cʼ
construct is being handled as linguistically different to the simple bytes. Use the Ordinal string comparison to force a byte comparison.
string find = "abcʼabcabc";
int index = find.IndexOf("c", StringComparison.Ordinal);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With