I'm obviously missing something here..
I'm writing a function that returns the number of substrings delimited by a particular string. Here is the rather simple function -
public static FuncError DCount(String v1, String v2, ref Int32 result) {
result = 0;
if (String.IsNullOrEmpty(v1)) {
return null;
}
if (String.IsNullOrEmpty(v2)) {
return null;
}
int ct = 1;
int ix = 0;
int nix = 0;
do {
nix = v1.IndexOf(v2, ix);
if (nix >= 0) {
ct++;
System.Diagnostics.Debug.Print(
string.Format("{0} found at {1} count={2} result = {3}",
v2, nix, ct, v1.Substring(nix,1)));
ix = nix + v2.Length;
}
} while (nix >= 0);
result = ct;
return null;
}
The problem comes when I call with a special character that is being used as a separator in a particular situation. It's returning lots of false positives. From the Debug.Print the first and the last argument should always be the same.
þ found at 105 count=2 result = t
þ found at 136 count=3 result = t
þ found at 152 count=4 result = þ
þ found at 249 count=5 result = t
þ found at 265 count=6 result = t
þ found at 287 count=7 result = t
þ found at 317 count=8 result = t
þ found at 333 count=9 result = þ
þ found at 443 count=10 result = þ
þ found at 553 count=11 result = þ
þ found at 663 count=12 result = þ
þ found at 773 count=13 result = þ
þ found at 883 count=14 result = þ
þ found at 993 count=15 result = þ
If I pass the þ as a char it works fine. If I split the string using þ as a delimiter it returns the correct number of elements. As for the incorrectly identified 't', there are other 't's in the results that are not being picked up, so it's not a character conversion issue.
Confused ...
Thanks
The problem here is how different cultures represent characters, and in some cases combine them.
The letter you're searching for, Thorn, can apparently be represented by the th
letters.
Try this code in LINQPad:
void Main()
{
string x = "uma thurman";
x.IndexOf("þ").Dump();
}
It will output 4
.
(Note that I run this program on a machine in Norway, it may or may not have an impact on the results)
This is the same "problem" as the german letter for double S - ß - can be found in words with two s's together, in some cultures.
You can use StringComparison.Ordinal
to get culture agnostic string matching.
using Lasse V. Karlsen's example:
string x = "uma thurman";
x.IndexOf("þ", StringComparison.Ordinal).Dump();
Will result in -1
.
See Best Practices for Using Strings in the .NET Framework for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With