It appears that the Entity Framework (latest version from NuGet) may be ignoring the HasRequired configuration when constructing joins for Navigation Properties other than the first one defined.
For example, given a POCO object (Person) with the following configuration:
var person = modelBuilder.Entity<Person>();
person.ToTable("The_Peoples");
person.HasKey(i => i.Id);
person.Property(i => i.Id).HasColumnName("the_people_id");
person.HasRequired(i => i.Address)
.WithMany()
.Map(map => map.MapKey("address_id"));
person.HasRequired(i => i.WorkPlace)
.WithMany()
.Map(map => map.MapKey("work_place_id"));
I'm attempting to load a list of people with the following query:
myContext.Set<People>()
.Include(o => o.Address)
.Include(o => o.WorkPlace);
Entity Framework generates the following query:
FROM [dbo].[The_Peoples] AS [Extent1]
INNER JOIN [dbo].[The_Addresses] AS [Extent2] ON [Extent1].[address_id] = [Extent2].[address_id]
LEFT OUTER JOIN [dbo].[The_Work_Places] AS [Extent3] ON [Extent1].[work_place_id] = [Extent3].[work_place_id]
Notice that the join to the *The_Addresses* table is an inner join (as expected), however, the subsequent join to the *The_Work_Places* is an outer join. Given that both the Address and WorkPlace properties are marked as required, I would expect both joins to be inner joins. I've also attempted marking the Address and WorkPlace properties with the Required attribute, but this had no effect.
Is this a bug or am I perhaps misconfiguring something? Suggestions?
Your model configuration is correct and I think it is not a bug but it is behaviour by design, but I cannot tell exactly what design. I've also seen that SQL in such queries. Only a few remarks:
The query you are seeing is not specific to EF 4.2. It would also occur for EF 4.1 and EF 4.0. But not for EF 1 (.NET 3.5). In EF 1 every Include
, also the first, has been mapped to a LEFT OUTER JOIN
, also for required relationships.
I think one cannot say that using an INNER JOIN
is "correct" for required navigation properties and LEFT OUTER JOIN
is wrong. From a mapping view point it doesn't matter what you use, given that the constraints in the database represent the relationships in the model correctly. For a required navigation property the FK column in the database must not be nullable and there must a constraint in the database which enforces that the FK refers to an existing row in the target table. If that is the case, every JOIN
must return a row, no matter if you use INNER JOIN
or LEFT OUTER JOIN
.
What happens if model and database is "out of sync" regarding the relationships? Basically nonsense happens in both cases: If you use a LEFT OUTER JOIN
and the FK is NULL
in the DB or refers to a not existing row, you'd get an entity where the navigation property is null
, violating the model definition that the property is required. Using an INNER JOIN
is not better: You'd get no entity at all, a query result which is at least as wrong as the result with the LEFT OUTER JOIN
, if not worse.
So, I think the change in .NET 4 to use INNER JOIN
s for some Include
s has been made not because the SQL in EF 1 was wrong but to create better and more performant SQL. This change actually introduced a breaking change in that some queries returned other results now than they did in EF 1: http://thedatafarm.com/blog/data-access/ef4-breaking-change-ef4-inner-joins-affect-eager-loading-many-to-many/
My understanding is that this has been fixed and that the reason was that INNER JOIN
s in too many situations have been introduced for eager loading in EF 4. (Perhaps in this phase (beta/release candidate for EF 4) your query would have had two INNER JOIN
s.) The reply to that problem from the EF team: http://connect.microsoft.com/VisualStudio/feedback/details/534675/ef4-include-method-returns-different-results-than-ef1-include (bold highlight from me):
We are fixing the issue for .net 4 RTM. This was an unintended breaking change. We did not make an intended change where every left outer join produced by an Include became an inner join in .Net 4. But rather the optimization looked at the constraints in the EF metadata and tried to convert those left outer joins which could be safely converted to inner joins based on the constraints. We had a bug in the code where we were reasoning based on the constraints which resulted in more aggressive conversion than what the constraints implied. We have scaled back the optimization so that we convert left outer joins to inner joins only in the places where we are absolutely sure we can do it based on the constraints. We think we can improve this optimization a little more in the future. You will start seeing more left outer joins for some queries in RTM when compared to RC and Beta 2 but in most of these cases this is needed to return correct results.
So, the final release for EF 4 apparently reintroduced some more LEFT OUTER JOIN
s (compared to beta/release candidate) to avoid a breaking change like that.
Sorry, this is more a historical story than a real explanation why you get an INNER JOIN
and then a LEFT OUTER JOIN
. As said, it is not wrong to write the query this way - as it wouldn't be wrong to use two INNER JOIN
s or two LEFT OUTER JOIN
s. I guess that only the EF team can explain exactly why your query produces that specific SQL.
I'd recommend - if you don't experience serious performance problems - not to worry about that SQL (since the result you get is correct after all) and proceed. Not liking the SQL which EF creates ends up in writing either a lot of feature and change requests or in writing a lot of raw SQL queries or in abandoning EF at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With