I've got a strange problem regarding linq-to-sql, and i've really tried search around for it. Im designing a sql database and where have just recently tried to retrieve an object from it.
The problem is with multiple joins. All my tables use identity-columns as primary keys.
Db designed as followed:
MasterTable : Id (primary key, identity column, int), MasterColumn1 (nvarchar(50))
Slave1: Id (primary key, identity column, int), MasterId (int, primary key -> MasterTable Id), SlaveCol1
Slave2: Id (primary key, identity column, int), MasterId (int, primary key -> MasterTable Id), SlaveColumn2
code used:
var db = new TestDbDataContext() { Log = Console.Out };
var res = from f in db.MasterTables
where f.MasterColumn1 == "wtf"
select new
{
f.Id,
SlaveCols1 = f.Slave1s.Select(s => s.SlaveCol1),
SlaveCols2 = f.Slave2s.Select(s => s.SlaveColumn2)
};
foreach (var re in res)
{
Console.Out.WriteLine(
re.Id + " "
+ string.Join(", ", re.SlaveCols1.ToArray()) + " "
+ string.Join(", ", re.SlaveCols2.ToArray())
);
}
And the log is:
SELECT [t0].[Id], [t1].[SlaveCol1], (
SELECT COUNT(*)
FROM [FR].[Slave1] AS [t2]
WHERE [t2].[MasterId] = [t0].[Id]
) AS [value]
FROM [FR].[MasterTable] AS [t0]
LEFT OUTER JOIN [FR].[Slave1] AS [t1] ON [t1].[MasterId] = [t0].[Id]
WHERE [t0].[MasterColumn1] = @p0
ORDER BY [t0].[Id], [t1].[Id]
-- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) [wtf]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.5420
SELECT [t0].[SlaveColumn2]
FROM [FR].[Slave2] AS [t0]
WHERE [t0].[MasterId] = @x1
-- @x1: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.5420
1 SlaveCol1Wtf SlaveCol2Wtf
Why oh why does it not do two outer joins instead? I really care about this because I have a much larger db with many tables referring to the one and the same table (all have one-to-many relation) and having 20 selects round-trips to databaseserver is not optimal!
As i side-note. I can produce wanted result by using explicit outer-joins like so:
var db = new TestDbDataContext() { Log = Console.Out };
var res = from f in db.MasterTables
join s1 in db.Slave1s on f.Id equals s1.MasterId into s1Tbl
from s1 in s1Tbl.DefaultIfEmpty()
join s2 in db.Slave2s on f.Id equals s2.MasterId into s2Tbl
from s2 in s2Tbl.DefaultIfEmpty()
where f.MasterColumn1 == "wtf"
select new { f.Id, s1.SlaveCol1, s2.SlaveColumn2 };
foreach (var re in res)
{
Console.Out.WriteLine(re.Id + " " + re.SlaveCol1 + " " + re.SlaveColumn2);
}
But i want to use the references Linq-To-Sql provides and not manual joins! How?
----------- edit -----------------
I've also tried prefetching like this:
using (new DbConnectionScope())
{
var db = new TestDbDataContext() { Log = Console.Out };
DataLoadOptions loadOptions = new DataLoadOptions();
loadOptions.LoadWith<MasterTable>(c => c.Slave1s);
loadOptions.LoadWith<MasterTable>(c => c.Slave2s);
db.LoadOptions = loadOptions;
var res = from f in db.MasterTables
where f.MasterColumn1 == "wtf"
select f;
foreach (var re in res)
{
Console.Out.WriteLine(re.Id + " " +
string.Join(", ", re.Slave1s.Select(s => s.SlaveCol1).ToArray()) + " " +
string.Join(", ", re.Slave2s.Select(s => s.SlaveColumn2).ToArray()));
}
}
same result =(
You are on the right track with the prefetching option using LoadOptions and traversing over the associations rather than explicit joins, however since you are trying to do multiple 1-M navigations from your MasterTable, you would effectively be creating a cartesian product between the Slave1 and Slave2 records. As a result, LINQ to SQL ignores your load options and lazy loads the child records for each of your children.
You can optimize this slightly by removing your second child load option. The generated query will now do a single request returning your MasterTable and Slave1s, but then lazy load each of the Slave2s. You should see the same thing if you do the following:
var res = from f in db.MasterTables
where f.MasterColun1 == "wtf"
select new
{
f.Id,
Cols1 = f.Slave1s.Select(s => s.SlaveCol1).ToArray()
Cols2 = f.Slave2s.Select(s => s.SlaveColumn2).ToArray()
}
You should see a left join between MasterTables and Slave1s and then lazy loading of the Slave2s to avoid the cartesian product between Slave1 and Slave2 in the flattened results from SQL.
As to the "why", Linq-to-SQL probably thinks it's making your query better by avoiding multiple outer-joins.
Say you're pulling 20 entries from the master table, and each slave table has 20 entries per entry in the master table. You'll pull 8000 entries across the wire in a single round-trip with an outer-join, as opposed to two round-trips with 400 apiece. There comes a point where it's cheaper to do two round-trips. It might not be correct in this particular case, but there is a strong likelihood that if you join very many tables this way and if you pull out much data per table, it could very easily tip the scales.
You may also want to look into the possibility that LINQ to SQL may be performing both SELECTs in a single round-trip, using multiple result sets. In this case the two-statement approach will probably be much faster than a double outer-join.
After a little more testing, it's becoming obvious that Jim Wooley's answer is more along the right track: apparently Linq to SQL simply decides not to eagerly load any but the first property that you specify. It's weird, too, because it's not exactly lazy-loading it, either. It loads each property in a separate round-trip as part of the initial evaluation of your query. That seems like a pretty significant limitation of LINQ to SQL to me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With