Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and remove duplicate objects in a collection using LINQ?

I have a simple class representing an object. It has 5 properties (a date, 2 decimals, an integer and a string). I have a collection class, derived from CollectionBase, which is a container class for holding multiple objects from my first class.

My question is, I want to remove duplicate objects (e.g. objects that have the same date, same decimals, same integers and same string). Is there a LINQ query I can write to find and remove duplicates? Or find them at the very least?

like image 202
Icemanind Avatar asked Jul 13 '10 17:07

Icemanind


People also ask

Does Linq Union remove duplicates?

Linq, acts upon 2 collections. It returns a new collection that contains the elements that are found. Union removes duplicates. So this method can be thought of as two actions: it combines the two collections and then uses Distinct() on them, removing duplicate elements.

How do you filter an array of objects to remove duplicates?

To remove the duplicates from an array of objects:Use the Array. filter() method to filter the array of objects. Only include objects with unique IDs in the new array.


2 Answers

You can remove duplicates using the Distinct operator.

There are two overloads - one uses the default equality comparer for your type (which for a custom type will call the Equals() method on the type). The second allows you to supply your own equality comparer. They both return a new sequence representing your original set without duplicates. Neither overload actually modifies your initial collection - they both return a new sequence that excludes duplicates..

If you want to just find the duplicates, you can use GroupBy to do so:

var groupsWithDups = list.GroupBy( x => new { A = x.A, B = x.B, ... }, x => x ) 
                         .Where( g => g.Count() > 1 );

To remove duplicates from something like an IList<> you could do:

yourList.RemoveAll( yourList.Except( yourList.Distinct() ) );
like image 71
LBushkin Avatar answered Oct 05 '22 10:10

LBushkin


If your simple class uses Equals in a manner that satisfies your requirements then you can use the Distinct method

var col = ...;
var noDupes = col.Distinct();

If not then you will need to provide an instance of IEqualityComparer<T> which compares values in the way you desire. For example (null problems ignored for brevity)

public class MyTypeComparer : IEqualityComparer<MyType> {
  public bool Equals(MyType left, MyType right) {
    return left.Name == right.Name;
  }
  public int GetHashCode(MyType type) {
    return 42;
  }
}

var noDupes = col.Distinct(new MyTypeComparer());

Note the use of a constant for GetHashCode is intentional. Without knowing intimate details about the semantics of MyType it is impossible to write an efficient and correct hashing function. In lieu of an efficient hashing function I used a constant which is correct irrespective of the semantics of the type.

like image 30
JaredPar Avatar answered Oct 05 '22 10:10

JaredPar