Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select distinct values from a list using LINQ in C#

Tags:

c#

.net

linq

I have a collection of Employee

Class Employee

{
  empName
  empID
  empLoc 
  empPL
  empShift
}

My list contains

 empName,empID,empLoc,empPL,empShift
    E1,1,L1,EPL1,S1 
    E2,2,L2,EPL2,S2
    E3,3,L3,EPL3,S3
    E4,4,L1,EPL1,S1
    E5,5,L5,EPL5,S5
        E6,6,L2,EPL2,S2

I need to take the employees having distinct values empLoc,empPL,empShift.

Is there is any way to achieve this using LINQ ?

like image 343
kbvishnu Avatar asked Sep 25 '12 09:09

kbvishnu


People also ask

How do I get distinct on a single column in LINQ?

distinct in Linq to get result based on one field of the table (so do not require a whole duplicated records from table). I know writing basic query using distinct as followed: var query = (from r in table1 orderby r. Text select r).

Why distinct is not working in LINQ?

LINQ Distinct is not that smart when it comes to custom objects. All it does is look at your list and see that it has two different objects (it doesn't care that they have the same values for the member fields). One workaround is to implement the IEquatable interface as shown here.

What is distinct in LINQ C#?

C# Linq Distinct() method removes the duplicate elements from a sequence (list) and returns the distinct elements from a single data source. It comes under the Set operators' category in LINQ query operators, and the method works the same way as the DISTINCT directive in Structured Query Language (SQL).


4 Answers

You can use GroupBy with anonymous type, and then get First:

list.GroupBy(e => new { 
                          empLoc = e.empLoc, 
                          empPL = e.empPL, 
                          empShift = e.empShift 
                       })

    .Select(g => g.First());
like image 194
cuongle Avatar answered Oct 04 '22 17:10

cuongle


You could implement a custom IEqualityComparer<Employee>:

public class Employee
{
    public string empName { get; set; }
    public string empID { get; set; }
    public string empLoc { get; set; }
    public string empPL { get; set; }
    public string empShift { get; set; }

    public class Comparer : IEqualityComparer<Employee>
    {
        public bool Equals(Employee x, Employee y)
        {
            return x.empLoc == y.empLoc
                && x.empPL == y.empPL
                && x.empShift == y.empShift;
        }

        public int GetHashCode(Employee obj)
        {
            unchecked  // overflow is fine
            {
                int hash = 17;
                hash = hash * 23 + (obj.empLoc ?? "").GetHashCode();
                hash = hash * 23 + (obj.empPL ?? "").GetHashCode();
                hash = hash * 23 + (obj.empShift ?? "").GetHashCode();
                return hash;
            }
        }
    }
}

Now you can use this overload of Enumerable.Distinct:

var distinct = employees.Distinct(new Employee.Comparer());

The less reusable, robust and efficient approach, using an anonymous type:

var distinctKeys = employees.Select(e => new { e.empLoc, e.empPL, e.empShift })
                            .Distinct();
var joined = from e in employees
             join d in distinctKeys
             on new { e.empLoc, e.empPL, e.empShift } equals d
             select e;
// if you want to replace the original collection
employees = joined.ToList();
like image 44
Tim Schmelter Avatar answered Oct 04 '22 15:10

Tim Schmelter


You can try with this code

var result =  (from  item in List
              select new 
              {
                 EmpLoc = item.empLoc,
                 EmpPL= item.empPL,
                 EmpShift= item.empShift
              })
              .ToList()
              .Distinct();
like image 28
Aghilas Yakoub Avatar answered Oct 04 '22 15:10

Aghilas Yakoub


I was curious about which method would be faster:

  1. Using Distinct with a custom IEqualityComparer or
  2. Using the GroupBy method described by Cuong Le.

I found that depending on the size of the input data and the number of groups, the Distinct method can be a lot more performant. (as the number of groups tends towards the number of elements in the list, distinct runs faster).

Code runs in LinqPad!

    void Main()
    {
        List<C> cs = new List<C>();
        foreach(var i in Enumerable.Range(0,Int16.MaxValue*1000))
        {
            int modValue = Int16.MaxValue; //vary this value to see how the size of groups changes performance characteristics. Try 1, 5, 10, and very large numbers
            int j = i%modValue; 
            cs.Add(new C{I = i, J = j});
        }
        cs.Count ().Dump("Size of input array");

        TestGrouping(cs);
        TestDistinct(cs);
    }

    public void TestGrouping(List<C> cs)
    {
        Stopwatch sw = Stopwatch.StartNew();
        sw.Restart();
        var groupedCount  = cs.GroupBy (o => o.J).Select(s => s.First()).Count();
        groupedCount.Dump("num groups");
        sw.ElapsedMilliseconds.Dump("elapsed time for using grouping");
    }

    public void TestDistinct(List<C> cs)
    {
        Stopwatch sw = Stopwatch.StartNew();
        var distinctCount = cs.Distinct(new CComparerOnJ()).Count ();
        distinctCount.Dump("num distinct");
        sw.ElapsedMilliseconds.Dump("elapsed time for using distinct");
    }

    public class C
    {
        public int I {get; set;}
        public int J {get; set;}
    }

    public class CComparerOnJ : IEqualityComparer<C>
    {
        public bool Equals(C x, C y)
        {
            return x.J.Equals(y.J);
        }

        public int GetHashCode(C obj)
        {
            return obj.J.GetHashCode();
        }
    }
like image 40
Raj Rao Avatar answered Oct 04 '22 16:10

Raj Rao