Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ Query: Determining if object in one list exists in another based on key

Tags:

c#

linq

Basically what I'm trying to do is take a list of objects and filter it based on a number of criteria where one of the criteria is that the key doesn't exist in another list. Here's an example:
My two classes are similar to this:

public class Test  
{  
  public string name;  
  public string instructor_name;  
  public string course;  
}  

public class Appointment  
{  
  public string site;
  public DateTime forWhen;
  public string testName;
}

I want to sort through a List<Test> by looking at the course and making sure that test doesn't exist in the List<Appointment>. In SQL I'd do it something like this:

SELECT new Group<Test>(c.Key, c)
FROM tests in testList
WHERE tests.Course != "Science"
AND tests.name NOT IN (SELECT testName FROM appotList)

However, I can't figure out how I would do this in LINQ. Any ideas?

like image 804
legacybass Avatar asked Jan 30 '11 21:01

legacybass


2 Answers

If you're talking about performing client-side filtering, that's pretty easy with LINQ to Objects. Something like this:

List<Test> tests = ...;
List<Appointment> appts = ...;

var query = tests.Except(
            tests.Join(appts, t => t.name, a => a.testName, (t, a) => t));

The following is slightly simpler to read:

var query = tests.Where(t => !appts.Any(a => a.testName == t.name));

But the first version will be faster, since the Join function will compute a hash table of the matches rather than doing a linear search of the appts list for every element in tests.

like image 106
Adam Robinson Avatar answered Oct 13 '22 00:10

Adam Robinson


http://introducinglinq.com/blogs/marcorusso/archive/2008/01/14/the-not-in-clause-in-linq-to-sql.aspx

Consider this code that returns all the customers who don't have an order in the Orders table. This is one SQL query that returns that value.

SELECT *
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] NOT IN (
    SELECT [t1].[CustomerID]
    FROM [dbo].[Orders] AS [t1]
)

This is not the faster way to get the desired result (using a NOT EXISTS is the favorite way - more on this shortly). LINQ offers a Contains extension method that allows writing the following code.

NorthwindDataContext dc = new NorthwindDataContext();
dc.Log = Console.Out;
var query =
    from c in dc.Customers
    where !(from o in dc.Orders
            select o.CustomerID)
           .Contains(c.CustomerID)
    select c;
foreach (var c in query) Console.WriteLine( c );

In LINQ to SQL the query is translated into this SQL code:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
       [t0].[ContactTitle], [t0].[Address], [t0].[City],
       [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE NOT (EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [dbo].[Orders] AS [t1]
    WHERE [t1].[CustomerID] = [t0].[CustomerID]
    ))

This approach is not only semantically equivalent, but also faster in execution. The following is the result with SET STATISTICS IO ON. The first result is for the hand-written query that use the NOT IN clause. The second result is for the LINQ to SQL generated query.

like image 40
Jakub Konecki Avatar answered Oct 12 '22 22:10

Jakub Konecki