LEFT OUTER JOIN in LINQ to objects

Question

Consider the following code.
City and CitPlace are joined by CityCode.
What I want to do is perform a LEFT OUTER JOIN between CityPlace and City.

City[] cities = new City[]{
 new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
 new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
 new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
 new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
};


CityPlace[] places = new CityPlace[]{
 new CityPlace{CityCode="0771",Place="Shankar Nagar"},
 new CityPlace{CityCode="0771",Place="Pandari"},
 new CityPlace{CityCode="0771",Place="Energy Park"},

 new CityPlace{CityCode="0751",Place="Baadaa"},
 new CityPlace{CityCode="0751",Place="Nai Sadak"},
 new CityPlace{CityCode="0751",Place="Jayendraganj"},
 new CityPlace{CityCode="0751",Place="Vinay Nagar"},

 new CityPlace{CityCode="0755",Place="Idgah Hills"},

 new CityPlace{CityCode="022",Place="Parel"},
 new CityPlace{CityCode="022",Place="Haaji Ali"},
 new CityPlace{CityCode="022",Place="Girgaon Beach"},

 new CityPlace{CityCode="0783",Place="Railway Station"}};

What I did is

var res = places.GroupJoin(cities,
                           p1=>p1.CityCode,
                           c1=>c1.CityCode,
                           (p2,c2s)=>new {Place=p2.Place,
                                CityName=c2s.Count()==0 ? "NO NAME" 
                                           : c2s.First().CityName });

foreach(var v in res)
 Console.WriteLine(v);

Is this the standard or it is quicky and dirty solution?

Glazed · Accepted Answer

Your own answer is fine, but it's not very elegant. So, yes, it's kind of dirty. There is a standard way of doing a left outer join, which handles your example and would handle cases where there are duplicate cities. Your example can't handle duplicate cities because any duplicates are ignored when you select c2s.First().

The standard left join steps are this:

Create a hierarchy from your data using GroupJoin.
Flatten the hierarchy with SelectMany.

Your GroupJoin is flattening the hierarchy in one step by ignoring everything except the first matching city. That's what's dirty about it. If you were to try to use this code in reverse by taking the cities and left joining them to the places, you would only get one place per city! That's obviously bad. It's better to learn how to do a left join the proper way, and then it will always work.

The SelectMany in step 2 is actually optional if you would prefer to keep the hierarchy and then use nested foreach loops to display them, but I'm going to assume you want to display the data in a flat table format.

If you just want to see the answer to your concrete problem, scroll down to the "Cities and Places" heading below, but first, here's a complete example using two simple string arrays.

Abstract Example with Complete Explanation

Here is a complete example using two arrays of letters instead of your code. I wanted to show a simpler example first. You can copy and paste this into LINQPad and set the Language to "C# Statements" and run it for yourself if you'd like. I highly recommend LINQPad as a tool for testing all sorts of code, not just LINQ. Alternatively, you can also create a console application in Visual Studio.

Here is the code without too many comments. Below it is a version that is heavily annotated. You may want to jump to that if you want to learn exactly what each parameter means.

var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

//Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
var groupJoin =
    leftLetters.GroupJoin(
        rightLetters, 
        leftLetter => leftLetter, 
        rightLetter => rightLetter, 
        ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } 
    );

//Flatten the groupJoin hierarchical collection with a SelectMany
var selectMany = 
    groupJoin.SelectMany(           
        groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ),            
        ( groupJoinItem, rightLetter ) => new {
            LeftLetter = groupJoinItem.leftLetter, 
            RightLetter = rightLetter 
        }
    );

//You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
foreach( var row in selectMany )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}

Here's the output, which should be pretty obvious since we all know what a left join is supposed to do.

A, A
B, B
C, MISSING

The heavily annotated version:

var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

//Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
var groupJoin =
    leftLetters.GroupJoin(
        rightLetters, //inner: the right hand collection in the join
        leftLetter => leftLetter, //outerKeySelector: There is no property to use as they join key, the letter *is* the key. So this lambda simply returns the parameter itself.
        rightLetter => rightLetter, //innerKeySelector: Same with the rightLetters
        ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } //resultSelector: given an element from the left items, and its matching collection of right items, project them to some class. In this case we are using a new anonymous type. 
    );

//Flatten the groupJoin hierarchical collection with a SelectMany
var selectMany = 
    groupJoin.SelectMany(
        //collectionSelector: given a single element from our collection of group join items from above, provide a collection of its "right" items which we want to flatten out. In this case the right items are in a property of the groupJoinItem itself, but this does not need to be the case! We use DefaultIfEmpty to turn an empty collection into a new collection that has exactly one item instead: the string "MISSING".
        groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ), 
        //resultSelector: SelectMany does the flattening for us and this lambda gets invoked once for *each right item* in a given left item's collection of right items.
        ( 
            groupJoinItem, //The first parameter is one of the original group join item, including its entire collection of right items, but we will ignore that collection in the body of this lamda and just grab the leftLetter property.
            rightLetter //The second parameter is *one* of the matching right items from the collection of right items we selected in the first lambda we passed into SelectMany.
        )  
            => new {
                LeftLetter = groupJoinItem.leftLetter, //groupJoinItem is one of the original items from the GroupJoin above. We just want the left letter from it.
                RightLetter = rightLetter //This is one of the individual right letters, so just select it as-is.
            }
    );

//You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
foreach( var row in selectMany )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}

Again, the output for reference:

A, A
B, B
C, MISSING

The above of using LINQ is often called a "method chain". You take some collections and chain together methods to get what you want. (Most of the time you don't use variables to hold the individual expressions. You just do GroupJoin(...).SelectMany(...), hence why it's called a "method chain". It's very verbose and explicit, and takes a long time to write.

Instead, we can use what's called a "comprehension", "query comprehension" or "LINQ comprehension". Comprehension is an old computer science term from the 1970's, which honestly doesn't make much sense to most people. Instead people call them "LINQ queries", or "LINQ expressions", but those technically apply to method chains as well, because in both cases you are building an expression tree. (Expression trees are outside of the scope of this tutorial.) A LINQ comprehension is a SQL-like syntax for writing LINQ, but it's not SQL! It has nothing to do with actual SQL. Here is the same code written as a query comprehension:

var leftLetters = new string[]{ "A", "B", "C" };
var rightLetters = new string[]{ "A", "B" };

var query = 
    from leftLetter in leftLetters
    join rightLetter in rightLetters
    on leftLetter equals rightLetter into matchingRightLetters
    from rightLetter in matchingRightLetters.DefaultIfEmpty( "MISSING" )
    select new
    {
        LeftLetter = leftLetter,
        RightLetter = rightLetter
    };

foreach( var row in query )
{
    Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
}

This will compile to the exact same code as the example above, except that the parameter named "groupJoinItem" in the SelectMany will be named something like "temp0" because that parameter doesn't explicitly exist in the comprehension version of this code.

I think you can appreciate how much simpler this version of the code is. I always use this syntax when doing a left outer join. I never use GroupJoin with SelectMany. However, at first glance it makes very little sense. A join followed by an into creates a GroupJoin. You first have to know this, and why you'd want this. Then the second from indicates a SelectMany, which is non-obvious. When you have two from keywords you are effectively creating a cross join (cartesian product), which is what SelectMany is doing. (Sort of.)

For example, this query:

from leftLetter in leftLetters
from rightLetter in rightLetters
select new
{
    LeftLetter = leftLetter,
    RightLetter = rightLetter
}

would yield:

A, A
A, B
B, A
B, B
C, A
C, B

It's a basic cross join.

So, back to our original left join LINQ query: The first from of the query is the group join, and the second from expresses a cross join between each groupJoinItem and its own collection of matching right letters. It's sort of like this:

from groupJoinItem in groupJoin
from rightLetter in groupJoinItem.matchingRightLetters
select new{...}

In fact, we could actually write it as such!

var groupJoin = 
    from leftLetter in leftLetters
    join rightLetter in rightLetters
    on leftLetter equals rightLetter into matchingRightLetters
    select new 
    {
        LeftLetter = leftLetter,
        MatchingRightLetters = matchingRightLetters
    };


var selectMany = 
    from groupJoinItem in groupJoin 
    from rightLetter in groupJoinItem.MatchingRightLetters.DefaultIfEmpty( "MISSING" )
    select new
    {
        LeftLetter = groupJoinItem.LeftLetter,
        RightLetter = rightLetter
    };

That selectMany expresses the following: "for each item in groupJoin, cross join it with its own MatchingRightLetters property, and concatenate all of the results together." This gives the exact same result as any of our left join code above.

That's probably way too much of an explanation for this simple question, but I dislike cargo cult programming (google it). You should know exactly what your code is doing, and why, or else you won't be able to tackle more difficult problems.

Cities and Places

So, here's the method chain version of your code. It's a whole program so people can run it if they'd like (Use the "C# Program" language type in LINQPad or make a console app with Visual Studio or the C# Compiler.)

void Main()
{
    City[] cities = new City[]{
        new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
        new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
        new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
        new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
    };  

    CityPlace[] places = new CityPlace[]{
        new CityPlace{CityCode="0771",Place="Shankar Nagar"},
        new CityPlace{CityCode="0771",Place="Pandari"},
        new CityPlace{CityCode="0771",Place="Energy Park"},

        new CityPlace{CityCode="0751",Place="Baadaa"},
        new CityPlace{CityCode="0751",Place="Nai Sadak"},
        new CityPlace{CityCode="0751",Place="Jayendraganj"},
        new CityPlace{CityCode="0751",Place="Vinay Nagar"},

        new CityPlace{CityCode="0755",Place="Idgah Hills"},

        new CityPlace{CityCode="022",Place="Parel"},
        new CityPlace{CityCode="022",Place="Haaji Ali"},
        new CityPlace{CityCode="022",Place="Girgaon Beach"},

        new CityPlace{CityCode="0783",Place="Railway Station"}
    };

    var query = 
        places.GroupJoin(
            cities,
            place => place.CityCode,
            city => city.CityCode,
            ( place, matchingCities ) 
                => new {
                    place,
                    matchingCities
                }
        ).SelectMany(
            groupJoinItem => groupJoinItem.matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } ),
            ( groupJoinItem, city )
                => new {
                    Place = groupJoinItem.place,
                    City = city
                }
        );              

    foreach(var pair in query)
    {
        Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
    }
}

class City
{
    public string CityCode;
    public string CityName;
    public string CityPopulation;
}

class CityPlace
{
    public string CityCode;
    public string Place;
}

Here's the output:

Shankar Nagar: Raipur
Pandari: Raipur
Energy Park: Raipur
Baadaa: Gwalior
Nai Sadak: Gwalior
Jayendraganj: Gwalior
Vinay Nagar: Gwalior
Idgah Hills: Bhopal
Parel: Mumbai
Haaji Ali: Mumbai
Girgaon Beach: Mumbai
Railway Station: NO NAME

Notice that the DefaultIfEmpty will return a new instance of the actual City class and not just a string. This is because we are joining the CityPlaces to actual City objects, not strings. You could use DefaultIfEmpty() instead, with no parameter, and you would get a null City for "Railway Station", but then you'd have to check for nulls in your foreach loop before calling pair.City.CityName. It's a matter of personal preference.

Here's the same program using a query comprehension:

void Main()
{
    City[] cities = new City[]{
        new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
        new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
        new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
        new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
    };  

    CityPlace[] places = new CityPlace[]{
        new CityPlace{CityCode="0771",Place="Shankar Nagar"},
        new CityPlace{CityCode="0771",Place="Pandari"},
        new CityPlace{CityCode="0771",Place="Energy Park"},

        new CityPlace{CityCode="0751",Place="Baadaa"},
        new CityPlace{CityCode="0751",Place="Nai Sadak"},
        new CityPlace{CityCode="0751",Place="Jayendraganj"},
        new CityPlace{CityCode="0751",Place="Vinay Nagar"},

        new CityPlace{CityCode="0755",Place="Idgah Hills"},

        new CityPlace{CityCode="022",Place="Parel"},
        new CityPlace{CityCode="022",Place="Haaji Ali"},
        new CityPlace{CityCode="022",Place="Girgaon Beach"},

        new CityPlace{CityCode="0783",Place="Railway Station"}
    };

    var query = 
        from place in places
        join city in cities
        on place.CityCode equals city.CityCode into matchingCities
        from city in matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } )
        select new {
            Place = place,
            City = city
        };      

    foreach(var pair in query)
    {
        Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
    }
}

class City
{
    public string CityCode;
    public string CityName;
    public string CityPopulation;
}

class CityPlace
{
    public string CityCode;
    public string Place;
}

As a long-time SQL user, I greatly prefer the query comprehension version. It's much easier for someone else to read the intent of the code once you know what the individual pieces of the query do.

Happy programming!

ScottS · Answer

Here is a linq query version

var noCity = new City {CityName = "NO NAME"};
var anotherway = from p in places
                 join c in cities on p.CityCode equals c.CityCode into merge
                 from c in merge.DefaultIfEmpty(noCity)
                 select new { p.Place, c.CityName };

I think the use of DefaultIfEmpty() makes it a little more clear.

All in all, I find outer joins in linq pretty darn confusing. It's one of the few places where I find SQL queries dramatically superior.

Adeel · Answer

In your case, you are not grouping the records so don't use your solution. you can use the solution from ScottS or use the query below.

var res = from p in places
                       select new
                       {
                           Place = p.Place,
                           CityName = (from c in cities where p.CityCode == c.CityCode select c.CityName).DefaultIfEmpty("NO NAME").ElementAtOrDefault(0)
                       };

LEFT OUTER JOIN in LINQ to objects

Tags:

c#

linq-to-objects

Akshay J

3 Answers

Abstract Example with Complete Explanation

Cities and Places

Glazed

ScottS

Adeel

Recent Activity

Donate For Us

LEFT OUTER JOIN in LINQ to objects

Tags:

c#

linq-to-objects

Akshay J

3 Answers

Abstract Example with Complete Explanation

Cities and Places

Glazed

ScottS

Adeel

Related questions

Recent Activity

Donate For Us