Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string.IndexOf() not recognizing modified characters

Tags:

c#

indexof

When using IndexOf to find a char which is followed by a large valued char (e.g. char 700 which is ʼ) then the IndexOf fails to recognize the char you are looking for.

e.g.

string find = "abcʼabcabc";   
int index = find.IndexOf("c");

In this code, index should be 2, but it returns 6.

Is there a way to get around this?

like image 863
puser Avatar asked Oct 21 '13 13:10

puser


2 Answers

Unicode letter 700 is a modifier apostrophe: in other words, it modifies the letter c. In the same way, if you were to use an 'e' followed by character 769 (0x301), it would not really be an 'e' anymore: the e has been modified to be e with an acute accent. To wit: é. You'll see that letter is actually two characters: copy it to notepad and hit backspace (neat, huh?).

You need to do an "Ordinal" comparison (byte-by-byte) without any linguistic comparison. That will find the 'c', and ignore the linguistic fact that it is modified by the next letter. In my 'e' example, the bytes are (65)(769), so if you go byte-by-byte looking for 65, you will find it, and that ignores the fact that (65)(769) is linguistically the same as (233): é. If you search for (233) linguistically it will find the "equivalent" (65)(769):

string find = "abéabcabc";
int index = find.IndexOf("é"); //gives you '2' even though the "find" has two characters and the the "indexof" is one

Hopefully that's not too confusing. If you're doing this in real code you should explain in comments exactly what you're doing: as in my 'e' example generally you would want to do semantic equivalence for user data, and ordinal equivalence for e.g. constants (which hopefully wouldn't be different like this, lest your successor hunt you down with an axe).

like image 129
Mark Sowul Avatar answered Oct 23 '22 00:10

Mark Sowul


The construct is being handled as linguistically different to the simple bytes. Use the Ordinal string comparison to force a byte comparison.

        string find = "abcʼabcabc";

        int index = find.IndexOf("c", StringComparison.Ordinal);
like image 7
Loofer Avatar answered Oct 23 '22 00:10

Loofer