Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join to an in-memory list efficiently

In EF, if I have a list of primatives (List), "joining" that against a table is easy:

var ids = int[]{1,4,6}; //some random values
var rows = context.SomeTable.Where(r => ids.Contains(r.id))

This gets much more complicated the instant you want to join on multiple columns:

var keys = something.Select(s => new { s.Field1, s.Field2 })
var rows = context.SomeTable.Where(r => keys.Contains(r => new { s.Field1, s.Field2 })); // this won't work

I've found two ways to join it, but neither is great:

  1. Suck in the entire table, and filtering it based on the other data. (this gets slow if the table is really large)
  2. For each key, query the table (this gets slow if you have a decent number of rows to pull in)

Sometimes, the compromise I've been able to make is a modified #1: pulling in subset of the table based on a fairly unique key

var keys = something.Select(s => s.Field1)
var rows = context.SomeTable.Where(r => keys.Contains(s.Field1)).ToList();
foreach (var sRow in something)
{
    var joinResult = rows.Where(r => r.Field1 == sRow.Field1 && r.Field2 == sRow.Field2);
    //do stuff
}

But even this could pull back too much data.

I know there are ways to coax table valued parameters into ADO.Net, and ways I can build a series of .Where() clauses that are OR'd together. Does anyone have any magic bullets?

like image 268
Adam Tegen Avatar asked Sep 07 '13 19:09

Adam Tegen


People also ask

Can I join a table to a list using LINQ?

You probably found out that you can't join an Entity Framework LINQ query with a local list of entity objects, because it can't be translated into SQL. I would preselect the database data on the account numbers only and then join in memory.

What is in memory join?

8.1 About In-Memory JoinsThe IM column store enhances the performance of joins when the tables being joined are stored in memory. Because of faster scan and join processing, complex multitable joins and simple joins that use Bloom filters benefit from the IM column store.


2 Answers

Instead of a .Contains(), how about you use an inner join and "filter" that way:

from s in context.SomeTable
join k in keys on new {k.Field1, k.Field2} equals new {s.Field1, s.Field2}

There may be a typo in the above, but you get the idea...

like image 131
Stefan Avatar answered Oct 19 '22 19:10

Stefan


I got exactly the same problem, and the solutions I came up with were:

  • Naive: do a separate query for each local record
  • Smarter: Create 2 lists of unique Filed1 values and unique Fiels2 values, query using 2 contains expressions and then you will have to double filter result as they might be not that accurate.

Looks like this:

 var unique1 = something.Select(x => x.Field1).Distinct().ToList();
 var unique2 = something.Select(x => x.Field2).Distinct().ToList();
 var priceData = rows.Where(x => unique1.Contains(x.Field1) && unique2.Contains(x.Field2));
  • Next one is my own solution which I called BulkSelect, the idea behind it is like this:

    • Create temp table using direct SQL command
    • Upload data for SELECT command to that temp table
    • Intercept and modify SQL which was generated by EF.

I did it for Postgres, but this may be ported to MSSQL is needed. This nicely described here and the source code is here

like image 44
Tony Avatar answered Oct 19 '22 19:10

Tony