This question is not about managing Windows pathnames; I used that only as a specific example of a case-insensitive string. (And I if I change the example now, a whole bunch of comments will be meaningless.)
This may be similar to Possible to create case insensitive string class?, but there isn't a lot of discussion there. Also, I don't really care about the tight language integration that string
enjoys or the performance optimizations of System.String
.
Let's say I use a lot of Windows pathnames which are (normally) case-insensitive (I'm not actually concerned with the many details of actual paths like \
vs. /
, \\\\
being the same as \
, file://
URLs, ..
, etc.). A simple wrapper might be:
sealed class WindowsPathname : IEquatable<WindowsPathname> /* TODO: more interfaces from System.String */
{
public WindowsPathname(string path)
{
if (path == null) throw new ArgumentNullException(nameof(path));
Value = path;
}
public string Value { get; }
public override int GetHashCode()
{
return Value.ToUpperInvariant().GetHashCode();
}
public override string ToString()
{
return Value.ToString();
}
public override bool Equals(object obj)
{
var strObj = obj as string;
if (strObj != null)
return Equals(new WindowsPathname(strObj));
var other = obj as WindowsPathname;
if (other != null)
return Equals(other);
return false;
}
public bool Equals(WindowsPathname other)
{
// A LOT more needs to be done to make Windows pathanames equal.
// This is just a specific example of the need for a case-insensitive string
return Value.Equals(other.Value, StringComparison.OrdinalIgnoreCase);
}
}
Yes, all/most of the interfaces on System.String should probably be implemented; but the above seems like enough for discussion purposes.
I can now write:
var p1 = new WindowsPathname(@"c:\foo.txt");
var p2 = new WindowsPathname(@"C:\FOO.TXT");
bool areEqual = p1.Equals(p2); // true
This allows me to "talk about" WindowsPathname
s in my code rather than a implementation detail like StringComparison.OrdinalIgnoreCase
. (Yes, this specific class could also be extended to handle \
vs /
so that c:/foo.txt would be equal to C:\FOO.TXT; but that's not the point of this question.) Furthermore, this class (with additional interfaces) will be case-insensitive when instances are added to collections; it would not necessary to specify an IEqualityComparer
. Finally, a specific class like this also makes it easier to prevent "non-sense" operations such as comparing a file system path to a registry key.
The question is: will such approach be successful? Are there any serious and/or subtle flaws or other "gotchas"? (Again, having to do with trying to setup a case-insensitive string class, not managing Windows pathnames.)
I would create an immutable struct that hold a string, converting the string in the constructor to a standard case (e.g. lowercase). Then you could also add the implicit operator to simplify the creation and override the compare operators. I think this is the simplest way to achieve the behaviour, plus you get only a small overhead (the conversion is only in the constructor).
Here's the code:
public struct CaseInsensitiveString
{
private readonly string _s;
public CaseInsensitiveString(string s)
{
_s = s.ToLowerInvariant();
}
public static implicit operator CaseInsensitiveString(string d)
{
return new CaseInsensitiveString(d);
}
public override bool Equals(object obj)
{
return obj is CaseInsensitiveString && this == (CaseInsensitiveString)obj;
}
public override int GetHashCode()
{
return _s.GetHashCode();
}
public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y)
{
return x._s == y._s;
}
public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y)
{
return !(x == y);
}
}
Here is the usage:
CaseInsensitiveString a = "STRING";
CaseInsensitiveString b = "string";
// a == b --> true
This works for collections as well.
So you want a something that converts a string to an object, and if you convert two strings to two of those objects, you want to be able to compare these objects for equality with your own set of rules about the equality of the two objects.
In your example it is about upper and lower case, but it could also be about forward slashes and backward slashes, maybe you even want to define that the "word" USD equals to $.
Suppose you divide the collection of all possible strings in subcollections of strings that you'd define to be equal. In that case "Hello" would be in the same subcollection as "HELLO" and "hElLO". Maybe "c:\temp" would be in the same collection as "c:/TEMP".
If you could find something to identify your subcollection, then you could say that all strings that belong to the same subcollection would have the same identifier. Or in other words: all strings that you defined equal would have the same identifier.
If that would be possible, then it would be enough to compare the subcollection identifier. If two strings have the same subcollection identifier, then they belong to the same subcollection and thus are considered equal according to our equality definition.
Let's call this identifier the normalized value of the string. The constructor of your CaseInsensitiveString could convert the input string into the normalized value of the string. To check two objects for equality all we have to do is check if they have the same normalized value.
An example of the normalization of a string would be:
According to the above the following Strings would all lead to the same normalized string:
We can define anything as a normalized string, as long as all strings that we define equal have the same normalized string. A good example would be
Note: I'm not going into detail about how to find words like USD and thousand separator. The importance is that you understand the meaning of normalized string.
Having said this, the only difficult part is to find the stringIdentifier. The rest of the class is fairly straightforward:
Code for the construction. The constructor takes a string and determines the subcollection it belongs to. I also added a default constructor.
public class CaseInsensitiveString : IEquatable<CaseInsensitiveString>
{
private string normalized = "";
public CaseInsensitiveString(string str)
{
this.normalized = Normalize(str);
}
public CaseInsensitiveString()
{
this.Normalize = Normalize(null);
}
}
Equality: by definition, two objects are the same if they have the same normalized value
See MSDN How to Define Value Equality for a Type
public bool Equals (CaseInsensitiveString other)
{
// false if other null
if (other != null) return false;
// optimization for same object
if (object.ReferenceEquals(this, other) return true;
// false if other a different type, for instance subclass
if (this.Gettype() != other.Gettype()) return false;
// if here: everything the same, compare the stringIdentifier
return this.normalized==other.normalized;
}
Note that this last line is the only code where we do actual equality checking!
All other equality functions only use the Equals function defined above:
public override bool Equals(object other)
{
return this.Equals(other as CaseInsensitiveString);
}
public override int GetHashCode()
{
return this.Normalized.GetHashCode();
}
public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y)
{
if (object.ReferenceEquals(x, null)
{ // x is null, true if y also null
return y==null;
}
else
{ // x is not null
return x.Equals(y);
}
}
public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y)
{
return !operator==(x, y);
}
So now you can do the following:
var x = new CaseInsensitiveString("White House $1,000,000");
var y = new CaseInsensitiveString("white house $1000000");
if (x == y)
...
Now the only thing we have to implement is the Normalize function. Once you know when two strings are considered equal you know how to normalize.
Suppose consider two strings equal if they are equal case insensitive and forward slashes are the same as backward slashes. (bad English)
If the normalize function returns the same string in lower case with all backward slashes, then two strings that we consider equal will have the same normalized value
private string Normalize(string str)
{
return str.ToLower().Replace('/', '\');
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With