Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with Cross Context Joins in LINQ-to-SQL

Initially I had written this query using LINQ-to-SQL

var result = from w in PatternDataContext.Windows
    join cf in PatternDataContext.ControlFocus on w.WindowId equals cf.WindowId
    join p in PatternDataContext.Patterns on cf.CFId equals p.CFId
    join r in ResultDataContext.Results on p.PatternId equals r.PatternId
    join fi in ResultDataContext.IclFileInfos on r.IclFileId equals fi.IclFileId
    join sp in sessionProfileDataContext.ServerProfiles on fi.ServerProfileId equals sp.ProfileId
    join u in infrastructure.Users on sp.UserId equals u.Id
    where w.Process.Equals(processName)
    select u.DistributedAppId;

And when I executed it, and saw result in the QuickWatch.., it showed this message:

the query contains references to items defined on a different data context

On googling, I found this topic at Stackoverflow itself, where I learned simulating cross context joins and as suggested there, I changed my query a bit to this:

var result = from w in PatternDataContext.Windows
    join cf in PatternDataContext.ControlFocus on w.WindowId equals cf.WindowId
    join p in PatternDataContext.Patterns on cf.CFId equals p.CFId
    join r in SimulateJoinResults() on p.PatternId equals r.PatternId
    join fi in SimulateJoinIclFileInfos() on r.IclFileId equals fi.IclFileId
    join sp in SimulateJoinServerProfiles() on fi.ServerProfileId equals sp.ProfileId
    join u in SimulateJoinUsers() on sp.UserId equals u.Id
    where w.Process.Equals(processName)
    select u.DistributedAppId;

This query is using these SimulateXyz methods:

private static IQueryable<Result> SimulateJoinResults()
{
  return from r in SessionDataProvider.Instance.ResultDataContext.Results select r;
}
private static IQueryable<IclFileInfo> SimulateJoinIclFileInfos()
{
  return from f in SessionDataProvider.Instance.ResultDataContext.IclFileInfos select f;
}
private static IQueryable<ServerProfile> SimulateJoinServerProfiles()
{
  return from sp in sessionProfileDataContext.ServerProfiles select sp;
}
private static IQueryable<User> SimulateJoinUsers()
{
  return from u in infrastructureDataContext.Users select u;
}

But even this approach didn't solve the problem. I'm still getting this message in QuickWatch...:

the query contains references to items defined on a different data context

Any solution for this problem? Along with the solution, I would also want to know why the problem still exists, and how exactly the new solution removes it, so that from next time I could solve such problems myself. I'm new to LINQ, by the way.

like image 418
Nawaz Avatar asked Mar 25 '11 07:03

Nawaz


People also ask

Can we use joins in LINQ?

In a LINQ query expression, join operations are performed on object collections. Object collections cannot be "joined" in exactly the same way as two relational tables. In LINQ, explicit join clauses are only required when two source sequences are not tied by any relationship.

How do I join two LINQ queries?

LINQ Join queries. As we know the JOIN clause is very useful when merging more than two table or object data into a single unit. It combines different source elements into one and also creates the relationship between them. Using the join, you can grab the data based on your conditions.

Which join is valid in LINQ?

Join and GroupJoin are joining operators. Join is like inner join of SQL. It returns a new collection that contains common elements from two collections whosh keys matches. Join operates on two sequences inner sequence and outer sequence and produces a result sequence.


1 Answers

I've had to do this before, and there are two ways to do it.

The first is to move all the servers into a single context. You do this by pointing LINQ-to-SQL to a single server, then, in that server, create linked servers to all the other servers. Then you just create views for any tables you're interested from the other servers, and add those views to your context.

The second is to manually do the joins yourself, by pulling in data from one context, and using just the properties you need to join into another context. For example,

int[] patternIds = SessionDataProvider.Instance.ResultDataContext.Results.Select(o => o.patternId).ToArray();
var results = from p in PatternDataContext.Patterns
              where patternIds.Contains(p.PatternId)
              select p;

Though the first is easier to work with, it does have its share of problems. The problem is that you're relying on SQL Server to be performant with linked servers, something it is notoriously bad at. For example, consider this query:

var results = from p in DataContext.Patterns
              join r in DataContext.LinkedServerResults on p.PatternId equals r.PatternId
              where r.userId = 10;

When you enumerate this query, the following will occur (let's call the normal and linked servers MyServer and MyLinkedServer, respectively)

  1. MyServer asks MyLinkedServer for the Results
  2. MyLinkedServer sends the Results back to MyServer
  3. MyServer takes those Results, joins them on the Patterns table, and returns only the ones with Results.userId = 10

So now the question is: When is the filtering done - on MyServer or MyLinkedServer? In my experience, for such a simple query, it will usually be done on MyLinkedServer. However, once the query gets more complicated, you'll suddenly find that MyServer is requesting the entire Results table from MyLinkedServer and doing the filtering after the join! This wastes bandwidth, and, if the Results tables is large enough, could turn a 50ms query into a 50 second query!

You could fix unperformant cross-server joins using stored procedures, but if you do a lot of complex cross-server joins, you may end up writing stored procedures for most of your queries, which is a lot of work and defeats part of the purpose of using L2SQL in the first place (not having to write a lot of SQL).

In comparison, the following code would always perform the filtering on the server containing the Results table:

int[] patternIds = (from r in SessionDataProvider.Instance.ResultDataContext.Results
                    where r.userId = 10
                    select r.PatternId).ToArray();
var results = from p in PatternDataContext.Patterns
              where patternIds.Contains(p.PatternId)
              select p;

Which is best for your situation is up to your best judgement.


Note that there is a third potential solution which I did not mention, as it is not really a programmer-solution: you could ask your server admins to set up a replication task to copy the necessary data from MyLinkedServer to MyServer once a day/week/month. This is only an option if:

  • Your program can work with slightly stale data from MyLinkedServer
  • You only need to read, never write, to MyLinkedServer
  • The tables you need from MyLinkedServers are not exorbitantly huge
  • You have the space/bandwidth available
  • Your database admins are not stingy/lazy
like image 162
BlueRaja - Danny Pflughoeft Avatar answered Sep 23 '22 22:09

BlueRaja - Danny Pflughoeft