Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left outer join using LINQ -- understanding the code

Tags:

c#

sql

linq

I would be grateful if someone could explain the meaning of the term into while using LINQ. In general, I am trying to understand how to make INNER JOIN, LEFT OUTER JOIN etc. in C#.

I have the main table Students that stores a few foreign ID keys which then are substituted by their names when running a query. The names are read from look up tables such as Marks, SoftwareVersions, Departments etc. All fields are required but MarkID. The query I tried to build in LINQ is this:

SELECT * FROM dbo.Students
INNER JOIN dbo.Departments ON dbo.Students.DepartmentID=dbo.Departments.DepartmentID
INNER JOIN dbo.SoftwareVersions ON dbo.Students.SoftwareVersionID=dbo.SoftwareVersions.SoftwareVersionID
INNER JOIN dbo.Statuses ON dbo.Students.StatusID=dbo.Statuses.StatusID
LEFT JOIN dbo.Marks ON dbo.Students.MarkID=dbo.Marks.MarkID
WHERE dbo.Students.DepartmentID=17;

I somehow managed to get the code below worked after reading plenty of articles and watching some videos but I don't feel like I have a complete understanding of the code. The bits that confuse me are in 5th line ending with into and then in the very next line beginning with from m .... I'm confused what into does and and what really happens in from m .... And this is the code in LINQ:

var result = from st in dbContext.Students where st.DepartmentID == 17
             join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
             join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
             join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
             join m in dbContext.Marks on st.MarkID equals m.MarkID into marksGroup
             from m in marksGroup.DefaultIfEmpty()
             select new
             {
                 student = st.StudentName,
                 department = p.DepartmentName,
                 software = sv.SoftwareVersionName,
                 status = st.StatusName,
                 marked = m != null ? m.MarkName : "-- Not marked --"
             };
like image 363
Celdor Avatar asked Nov 21 '14 10:11

Celdor


People also ask

How left outer join implement in Linq?

A left outer join is a join in which each element of the first collection is returned, regardless of whether it has any correlated elements in the second collection. You can use LINQ to perform a left outer join by calling the DefaultIfEmpty method on the results of a group join.

What is a left outer join?

A left outer join is a method of combining tables. The result includes unmatched rows from only the table that is specified before the LEFT OUTER JOIN clause. If you are joining two tables and want the result set to include unmatched rows from only one table, use a LEFT OUTER JOIN clause or a RIGHT OUTER JOIN clause.

Is Linq join inner or outer?

One commonly used feature of Language-Integrated Query (LINQ) is the facility to combine two sequences of related data using joins. The standard join operation provides an inner join but with a minor modification can be changed to give a left outer join.


2 Answers

I believe Example section from How to: Perform Left Outer Joins MSDN page is really well explained. Let's project it to your example. To quote first paragraph from the page

The first step in producing a left outer join of two collections is to perform an inner join by using a group join. (See How to: Perform Inner Joins (C# Programming Guide) for an explanation of this process.) In this example, the list of Person objects is inner-joined to the list of Pet objects based on a Person object that matches Pet.Owner.

So in your case, the first step is to perform an inner join of list of Students objects with the list of Marks objects based on MarkID in Students object matches MarkID in Marks object. As can be seen in the quote, inner join is being performed using group join. If you check Note section in MSDN page on how to perform group join, you can see that

Each element of the first collection appears in the result set of a group join regardless of whether correlated elements are found in the second collection. In the case where no correlated elements are found, the sequence of correlated elements for that element is empty. The result selector therefore has access to every element of the first collection.

What this means in the context of your example, is that by using into you have group joined results where you have all Students objects, and sequence of correlated elements of Marks objects (in case there is no matching Marks objects, the sequence is going to be empty).

Now let's go back to How to: Perform Left Outer Joins MSDN page, in particular second paragraph

The second step is to include each element of the first (left) collection in the result set even if that element has no matches in the right collection. This is accomplished by calling DefaultIfEmpty on each sequence of matching elements from the group join. In this example, DefaultIfEmpty is called on each sequence of matching Pet objects. The method returns a collection that contains a single, default value if the sequence of matching Pet objects is empty for any Person object, thereby ensuring that each Person object is represented in the result collection.

Again, to project this to your example, DefaultIsEmpty() is being called on each sequence of matching Marks objects. As explained above, the method returns a collection that contains a single, default value if the sequence of matching Marks objects is empty for any Student object, which ensures each Student object will be represented in the resulting collection. As a result what you have is set of elements, that contain all Student objects, and matching Marks object, or if there is no matching Marks object, default value of Marks, which in this case is null.

like image 140
Michael Avatar answered Sep 28 '22 22:09

Michael


what I can say is that "into MarksGroup" stores the result data of your joined tables into a temporary (application based, not database based) resultset (in sql terms: a table, so its a SELECT INTO)

In the next line, your code then selects from Marksgroup the columns with your data (in sql terms: SELECT student, department, software, status, marked FROM Marksgroup

So basically, it's getting your data from the db, then putting it aside to "Marksgroup, and in the very next step getting Marksgroup back in your fingers to take out the data you want to use in your c# code.

Try to get rid of Marksgroup, it should be possible (haven't tested ist with your code). It should be something like this:

from st in dbContext.Students where st.DepartmentID == 17
             join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
             join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
             join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
             join m in dbContext.Marks on st.MarkID equals m.MarkID

             select new
             {
                 student = st.StudentName,
                 department = p.DepartmentName,
                 software = sv.SoftwareVersionName,
                 status = st.StatusName,
                 marked = m != null ? m.MarkName : "-- Not marked --"
             };

Your second question with 'm' : This should also show a different behaviour without your temporary resultset "Marksgroup"

like image 30
Markus Avatar answered Sep 29 '22 00:09

Markus