C# String Comparison equates to false

Tags:

I have a string comparison issue that - for the most part - behaves as expected, but is leaving me with a large number f duplicate DB insertions because my code is not detecting the string pairs as duplicate.

I thought I had narrowed it down to a culture issue (Cyrillic characters), which I resolved, but I'm now getting 'false negatives' (two apparently equal strings showing up as not-equal).

I've looked at the following similar questions and tried the following comparison approaches.

Similar SO questions that I've checked:

Why does my comparison always return false?
C# string equality operator returns false, but I'm pretty sure it should be true... What?
String Equals() method fails even though the two strings are same in C#?
Differences in string compare methods in C#

Here's an example of the strings being compared: (title and description)

feed title: Ellsberg: He's a hero

feed desc: Daniel Ellsberg tells CNN's Don Lemon that NSA leaker Edward Snowden showed courage, has done an enormous service.

db title: Ellsberg: He's a hero

db desc: Daniel Ellsberg tells CNN's Don Lemon that NSA leaker Edward Snowden showed courage, has done an enormous service.

My app compares values fetched from RSS feeds with values I have in the DB and should only insert "new" values.

Click to copy

//fetch existing articles from DB for the current feed:
    List<Article> thisFeedArticles = (from ar in entities.Items
                                      where (ar.ItemTypeId == (int)Enums.ItemType.Article) && ar.ParentId == feed.FeedId
                                      && ar.DatePublished > datelimit
                                      select new Article
                                      {
                                           Title = ar.Title, 
                                           Description = ar.Blurb
                                      }).ToList();

Everyone of the below comparison show no match for the Ellsberg title/description. i.e. matches1 to matches6 all have Count()==0

(please excuse the enumerated variable names - they are just for testing)

Click to copy

   // comparison methods 
CompareOptions compareOptions = CompareOptions.OrdinalIgnoreCase;
CompareOptions compareOptions2 = CompareOptions.IgnoreSymbols | CompareOptions.IgnoreNonSpace;
//1
IEnumerable<Article> matches = thisFeedArticles.Where(b =>
    String.Compare(b.Title.Trim().Normalize(), a.Title.Trim().Normalize(), CultureInfo.InvariantCulture, compareOptions) == 0 &&
    String.Compare(b.Description.Trim().Normalize(), a.Description.Trim().Normalize(), CultureInfo.InvariantCulture, compareOptions) == 0
    );

//2
IEnumerable<Article> matches2 = thisFeedArticles.Where(b =>
    String.Compare(b.Title, a.Title, CultureInfo.CurrentCulture, compareOptions2) == 0 &&
    String.Compare(b.Description, a.Description, CultureInfo.CurrentCulture, compareOptions2) == 0
    );

//3
IEnumerable<Article> matches3 = thisFeedArticles.Where(b =>
    String.Compare(b.Title, a.Title, StringComparison.OrdinalIgnoreCase) == 0 &&
    String.Compare(b.Description, a.Description, StringComparison.OrdinalIgnoreCase) == 0
    );

//4
IEnumerable<Article> matches4 = thisFeedArticles.Where(b =>
    b.Title.Equals(a.Title, StringComparison.OrdinalIgnoreCase) &&
    b.Description.Equals(a.Description, StringComparison.OrdinalIgnoreCase)
    );

//5
IEnumerable<Article> matches5 = thisFeedArticles.Where(b =>
    b.Title.Trim().Equals(a.Title.Trim(), StringComparison.InvariantCultureIgnoreCase) &&
    b.Description.Trim().Equals(a.Description.Trim(), StringComparison.InvariantCultureIgnoreCase)
    );

//6
IEnumerable<Article> matches6 = thisFeedArticles.Where(b =>
    b.Title.Trim().Normalize().Equals(a.Title.Trim().Normalize(), StringComparison.OrdinalIgnoreCase) &&
    b.Description.Trim().Normalize().Equals(a.Description.Trim().Normalize(), StringComparison.OrdinalIgnoreCase)
    );


    if (matches.Count() == 0 && matches2.Count() == 0 && matches3.Count() == 0 && matches4.Count() == 0 && matches5.Count() == 0 && matches6.Count() == 0 && matches7.Count() == 0)
    {
    //insert values
    }

    //this if statement was the first approach
    //if (!thisFeedArticles.Any(b => b.Title == a.Title && b.Description == a.Description)
    // {
    // insert
    // }

Obviously I have only been using one of the above options at a time.

For the most part, the above options do work and most duplicates are detected, but there are still duplicates slipping through the cracks - I just need to understand what the "cracks" are, so any suggestions would be most welcome.

I did even try converting the strings to byte arrays and comparing those (deleted that code a while ago, sorry).

the Article object is as follows:

Click to copy

    public class Article
    {
        public string Title;
        public string Description;
    }

UPDATE:

I've tried Normalizing the strings as well as including the IgnoreSymbols CompareOption and I am still getting a false negative (non-match). What I am noticing though, is that apostrophes seem to make a consistent appearance in the false non-matches; so I'm thinking it might be a case of apostrophe vs single-quote i.e. ' vs ’ (and the like), but surely IgnoreSymbols should avoid that?

I found a couple more similar SO posts: C# string comparison ignoring spaces, carriage return or line breaks String comparison: InvariantCultureIgnoreCase vs OrdinalIgnoreCase? Next step: try using regex to strip white space as per this answer: https://stackoverflow.com/a/4719009/2261245

UPDATE 2 After the 6 comparison STILL returned no matches, I realised that there had to be another factor skewing the results, So I tried the following

Click to copy

//7
IEnumerable<Article> matches7 = thisFeedArticles.Where(b =>
    Regex.Replace(b.Title, "[^0-9a-zA-Z]+", "").Equals(Regex.Replace(a.Title, "[^0-9a-zA-Z]+", ""), StringComparison.InvariantCultureIgnoreCase) &&
    Regex.Replace(b.Description, "[^0-9a-zA-Z]+", "").Equals(Regex.Replace(a.Description, "[^0-9a-zA-Z]+", ""), StringComparison.InvariantCultureIgnoreCase)
    );

this DOES find the matches the others miss!

the string below got through all 6 comparisons, but not number 7:

a.Title.Trim().Normalize() and a.Title.Trim() both return:

"Corrigendum: Identification of a unique TGF-β–dependent molecular and functional signature in microglia"

Value in the DB is:

"Corrigendum: Identification of a unique TGF-ß–dependent molecular and functional signature in microglia"

Closer inspection shows that the German 'eszett' character is different in the DB compared to what's coming through from the feed: β vs ß

I would have expected at least one of comparisons 1-6 to pick that up...

Interestingly, after some performance comparisons, the Regex option is by no means the slowest of the seven. Normalize appears to quite more intensive than the Regular Expression! Here are the Stopwatch durations for all seven when the thisFeedArticles object contains 12077 items

Time elapsed: 00:00:00.0000662
Time elapsed: 00:00:00.0000009
Time elapsed: 00:00:00.0000009
Time elapsed: 00:00:00.0000009
Time elapsed: 00:00:00.0000009
Time elapsed: 00:00:00.0000009
Time elapsed: 00:00:00.0000016

551

asked Sep 12 '14 13:09

Adam Hey

1 Answers

Unicode strings can be "binary" different, even if they are "semantically" the same.

Try normalizing your strings. For more information, see http://msdn.microsoft.com/en-us/library/System.String.Normalize.aspx

answered Sep 28 '22 17:09

Kris Vandermotten

Related questions
                            
                                compression and utf8 encoding
                            
                                Entity Framework multiple context and Microsoft Azure. How to update database?
                            
                                Automatically remove unused code with Resharper
                            
                                Documenting descriptions on Complex Types
                            
                                Can you post an anonymous object as json to a webapi method and it match the properties of the json object to the parameters of the webapi call?
                            
                                Exponential based Curve-Fit using Math.Net
                            
                                XML traversing using XmlDocument
                            
                                Play Raspberry Pi h264 stream in C# app
                            
                                Disposal of AsyncLazy, what is the right (easy to use and non-leaky) way?
                            
                                How to set RadComboBox with a data source to AutomaticLoadOnDemand programmatically
                            
                                Could not load file or assembly FSharp.Core, Version=4.0.0.0 Azure Web Role
                            
                                How can I exit (stop) my Dispatcher?
                            
                                Precompile C# method before executing
                            
                                NullReferenceException at System.Web.Mvc.FilterProviderCollection.GetFilters
                            
                                DbGeography with MySQL and EntityFramework
                            
                                Cannot get Console.Writeline to redirect to visual studio window [duplicate]
                            
                                How to Draw chart in windows phone 8 or 8.1 without using external library?
                            
                                How to allow URLs to contain dots in ASP.NET MVC5?
                            
                                UserPrincipal.IsMemberOf is returning false
                            
                                Using Moq to mock a unsafe interface

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C# String Comparison equates to false

Tags:

c#

.net

string-comparison

Adam Hey

People also ask

1 Answers

Kris Vandermotten

Recent Activity

Donate For Us