Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need to strip all the symbols from a string in order to create an `IEqualityComparer` that ignores punctuation symbols

In part of my application I have an option that displays a list of albums by the current artist that aren't in the music library. To get this I call a music API to get the list of all albums by that artist and then I remove the albums that are in the current library.

To cope with the different casing of names and the possibility of missing (or extra punctuation) in the title I have written an IEqualityComparer to use in the .Except call:

var missingAlbums = allAbumns.Except(ownedAlbums, new NameComparer());

This is the Equals method:

public bool Equals(string x, string y)
{
    // Check whether the compared objects reference the same data.
    if (ReferenceEquals(x, y)) return true;

    // Check whether any of the compared objects is null.
    if (x is null || y is null)
        return false;

    return string.Compare(x, y, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0;
}

This is the GetHashCode method:

public int GetHashCode(string obj)
{
    // Check whether the object is null
    if (obj is null) return 0;

    // Make lower case. How do I strip symbols?
    return obj.ToLower().GetHashCode();
}

This fails, of course, when the string contains symbols as I'm not removing them before getting the hash code so the two strings (e.g. "Baa, baa, black sheep" and "Baa baa Black sheep") are still not equal even after converting to lower case.

I have written a method that will strip the symbols, but that meant I had to guess what those symbols actually are. It works for the cases I've tried so far, but I'm expecting it to fail eventually. I'd like a more reliable method of removing the symbols.

Given that the CompareOptions.IgnoreSymbols exists, is there a method I can call that will strip these characters from a string? Or failing that, a method that will return all the symbols?

I have found the IsPunctuation method for characters, but I can't determine whether what this deems to be punctuation is the same as what the string compare option deems to be a symbol.

like image 513
ChrisF Avatar asked Jun 15 '21 22:06

ChrisF


1 Answers

If you're going to use the CompareOptions enum, I feel like you might as well use it with the CompareInfo class that it's documented as being designed for:

Defines the string comparison options to use with CompareInfo.

Then you can just use the GetHashCode(string, CompareOptions) method from that class (and even the Compare(string, string, CompareOptions) method if you like).

like image 134
Peter Duniho Avatar answered Oct 14 '22 19:10

Peter Duniho