We have a table called Student
. That table has a field called Homeroom
, where the value is a room number of the student's homeroom. The value can be null.
We have a second table called Staff
. That table also has a field called Homeroom
to indicate which homeroom the teacher is assigned to. The value can be null.
But when the student's Homeroom
is null, a Staff
record should not be returned.
We used to take advantage of the fact that checking two null fields for equality always returns false in SQL. Through SQL, this is how we would get the data we want:
SELECT STUDENT.ID, STAFF.NAME as [Homeroom Teacher]
FROM STUDENT
LEFT OUTER JOIN STAFF ON
STAFF.BUILDING = STUDENT.BUILDING AND
STAFF.HOMEROOM = STUDENT.HOMEROOM
Student would be returned, but no teacher.
We are using Entity Framework with Code First POCO objects. So, we have a Student object and a Staff object. When we recreate this SQL in LINQ:
from student in repo.GetStudents()
join homeroomTeacher in repo.GetStaff()
new { student.Building, Room = student.Homeroom }
equals new { homeroomTeacher.Building, Room = homeroomTeacher.Homeroom }
into roj2
from homeroomTeacherRoj in roj2.DefaultIfEmpty()
select student.Id, homeroomTeacherRoj.Name;
The SQL generated contains a NULL check on both Homeroom fields:
SELECT STUDENT.ID, STAFF.NAME
FROM STUDENT AS [Extent1]
LEFT OUTER JOIN [dbo].[STAFF] AS [Extent2] ON
([Extent1].[BUILDING] = [Extent2].[BUILDING]) AND
(
([Extent1].[HOMEROOM] = [Extent2].[HOMEROOM]) OR
(([Extent1].[HOMEROOM] IS NULL) AND ([Extent2].[HOMEROOM] IS NULL))
)
This will return the student, and any staff who does not have a homeroom defined. That's not what we wanted or expected based on how we previously wrote our SQL statements.
An obvious way around it is to make sure we don't include staff that do not have a homeroom (join homeroomTeacher in repo.GetStaff().Where(staff => staff.Homeroom != null)
. But is there another way in the LINQ to prevent null checks on fields when joining them?
if you move your join into a where clause, then the following setting on the DbContext object will turn off the (EF 6 introduced) NULL checking behaviour:
Context.Configuration.UseDatabaseNullSemantics = true;
So, to "Join" in a where clause, you can split out the query into 2 IQueryable objects
var subquery = from homeroomTeacher in repo.GetStaff()
where ...
select homeroomTeacher;
var query = from student in repo.GetStudents()
where subquery.Any(homeroomTeacher =>
homeroomTeacher.xxx == student.xxx) -- simplified join for demo code
select student;
fyi, the UseDatabaseNullSemantics was introduced to fix-up this behaviour but it looks like they forgot the JOIN semantics and only applied it to the WHERE semantics.
This original statement was wrong - EF 4.3.1 exhibited the same JOIN behaviour:
This essentially means some result sets are now different for EF 6, compared to previous versions. This, in my mind, IS A BIG DEAL !!!! as it introduced bugs into my working solutions !!!!
I have raised an issue on codeplex:https://entityframework.codeplex.com/workitem/2006
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With