How can I compare 2 strings in C# ignoring the case, spaces and any line-breaks. I also need to check if both strings are null then they are marked as same.
Thanks!
You should normalize each string by removing the characters that you don't want to compare and then you can perform a String.Equals
with a StringComparison
that ignores case.
Something like this:
string s1 = "HeLLo wOrld!"; string s2 = "Hello\n WORLd!"; string normalized1 = Regex.Replace(s1, @"\s", ""); string normalized2 = Regex.Replace(s2, @"\s", ""); bool stringEquals = String.Equals( normalized1, normalized2, StringComparison.OrdinalIgnoreCase); Console.WriteLine(stringEquals);
Here Regex.Replace
is used first to remove all whitespace characters. The special case of both strings being null is not treated here but you can easily handle that case before performing the string normalization.
This may also work.
String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0
Edit:
IgnoreSymbols: Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, the percent sign, mathematical symbols, the ampersand, and so on.
Remove all the characters you don't want and then use the ToLower() method to ignore case.
edit: While the above works, it's better to use StringComparison.OrdinalIgnoreCase
. Just pass it as the second argument to the Equals
method.
First replace all whitespace via regular expression from both string and then use the String.Compare
method with parameter ignoreCase = true.
string a = System.Text.RegularExpressions.Regex.Replace("void foo", @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace("voidFoo", @"\s", "");
bool isTheSame = String.Compare(a, b, true) == 0;
If you need performance, the Regex solutions on this page run too slow for you. Maybe you have a large list of strings you want to sort. (A Regex solution is more readable however)
I have a class that looks at each individual char in both strings and compares them while ignoring case and whitespace. It doesn't allocate any new strings. It uses the char.IsWhiteSpace(ch)
to determine whitespace, and char.ToLowerInvariant(ch)
for case-insensitivity (if required). In my testing, my solution runs about 5x - 8x faster than a Regex-based solution. My class also implements IEqualityComparer's GetHashCode(obj)
method using this code in another SO answer. This GetHashCode(obj)
also ignores whitespace and optionally ignores case.
Here's my class:
private class StringCompIgnoreWhiteSpace : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null) //stry may contain only whitespace
return string.IsNullOrWhiteSpace(stry);
else if (stry == null) //strx may contain only whitespace
return string.IsNullOrWhiteSpace(strx);
int ix = 0, iy = 0;
for (; ix < strx.Length && iy < stry.Length; ix++, iy++)
{
char chx = strx[ix];
char chy = stry[iy];
//ignore whitespace in strx
while (char.IsWhiteSpace(chx) && ix < strx.Length)
{
ix++;
chx = strx[ix];
}
//ignore whitespace in stry
while (char.IsWhiteSpace(chy) && iy < stry.Length)
{
iy++;
chy = stry[iy];
}
if (ix == strx.Length && iy != stry.Length)
{ //end of strx, so check if the rest of stry is whitespace
for (int iiy = iy + 1; iiy < stry.Length; iiy++)
{
if (!char.IsWhiteSpace(stry[iiy]))
return false;
}
return true;
}
if (ix != strx.Length && iy == stry.Length)
{ //end of stry, so check if the rest of strx is whitespace
for (int iix = ix + 1; iix < strx.Length; iix++)
{
if (!char.IsWhiteSpace(strx[iix]))
return false;
}
return true;
}
//The current chars are not whitespace, so check that they're equal (case-insensitive)
//Remove the following two lines to make the comparison case-sensitive.
chx = char.ToLowerInvariant(chx);
chy = char.ToLowerInvariant(chy);
if (chx != chy)
return false;
}
//If strx has more chars than stry
for (; ix < strx.Length; ix++)
{
if (!char.IsWhiteSpace(strx[ix]))
return false;
}
//If stry has more chars than strx
for (; iy < stry.Length; iy++)
{
if (!char.IsWhiteSpace(stry[iy]))
return false;
}
return true;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
int hash = 17;
unchecked // Overflow is fine, just wrap
{
for (int i = 0; i < obj.Length; i++)
{
char ch = obj[i];
if(!char.IsWhiteSpace(ch))
//use this line for case-insensitivity
hash = hash * 23 + char.ToLowerInvariant(ch).GetHashCode();
//use this line for case-sensitivity
//hash = hash * 23 + ch.GetHashCode();
}
}
return hash;
}
}
private static void TestComp()
{
var comp = new StringCompIgnoreWhiteSpace();
Console.WriteLine(comp.Equals("abcd", "abcd")); //true
Console.WriteLine(comp.Equals("abCd", "Abcd")); //true
Console.WriteLine(comp.Equals("ab Cd", "Ab\n\r\tcd ")); //true
Console.WriteLine(comp.Equals(" ab Cd", " A b" + Environment.NewLine + "cd ")); //true
Console.WriteLine(comp.Equals(null, " \t\n\r ")); //true
Console.WriteLine(comp.Equals(" \t\n\r ", null)); //true
Console.WriteLine(comp.Equals("abcd", "abcd h")); //false
Console.WriteLine(comp.GetHashCode(" a b c d")); //-699568861
//This is -699568861 if you #define StringCompIgnoreWhiteSpace_CASE_INSENSITIVE
// Otherwise it's -1555613149
Console.WriteLine(comp.GetHashCode("A B c \t d"));
}
Here's my testing code (with a Regex example):
private static void SpeedTest()
{
const int loop = 100000;
string first = "a bc d";
string second = "ABC D";
var compChar = new StringCompIgnoreWhiteSpace();
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compChar.Equals(first, second);
}
sw1.Stop();
Console.WriteLine(string.Format("char time = {0}", sw1.Elapsed)); //char time = 00:00:00.0361159
var compRegex = new StringCompIgnoreWhiteSpaceRegex();
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compRegex.Equals(first, second);
}
sw2.Stop();
Console.WriteLine(string.Format("regex time = {0}", sw2.Elapsed)); //regex time = 00:00:00.2773072
}
private class StringCompIgnoreWhiteSpaceRegex : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null)
return string.IsNullOrWhiteSpace(stry);
else if (stry == null)
return string.IsNullOrWhiteSpace(strx);
string a = System.Text.RegularExpressions.Regex.Replace(strx, @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace(stry, @"\s", "");
return String.Compare(a, b, true) == 0;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
string a = System.Text.RegularExpressions.Regex.Replace(obj, @"\s", "");
return a.GetHashCode();
}
}
I would probably start by removing the characters you don't want to compare from the string before comparing. If performance is a concern, you might look at storing a version of each string with the characters already removed.
Alternatively, you could write a compare routine that would skip over the characters you want to ignore. But that just seems like more work to me.
You can also use the following custom function
public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if (!toExclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}
public static bool SpaceCaseInsenstiveComparision(this string stringa, string stringb)
{
return (stringa==null&&stringb==null)||stringa.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }).Equals(stringb.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }));
}
And then use it following way
"Te st".SpaceCaseInsenstiveComparision("Te st");
Another option is the LINQ SequenceEquals
method which according to my tests is more than twice as fast as the Regex approach used in other answers and very easy to read and maintain.
public static bool Equals_Linq(string s1, string s2)
{
return Enumerable.SequenceEqual(
s1.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant),
s2.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant));
}
public static bool Equals_Regex(string s1, string s2)
{
return string.Equals(
Regex.Replace(s1, @"\s", ""),
Regex.Replace(s2, @"\s", ""),
StringComparison.OrdinalIgnoreCase);
}
Here the simple performance test code I used:
var s1 = "HeLLo wOrld!";
var s2 = "Hello\n WORLd!";
var watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Linq(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~1.7 seconds
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Regex(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~4.6 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With