Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Left Outer Join two DataTables in c#?

How can I Left Outer Join two data tables with the following tables and conditions while keeping all columns from both tables?

dtblLeft:

 id   col1   anotherColumn2
 1    1      any2
 2    1      any2
 3    2      any2
 4    3      any2
 5    3      any2
 6    3      any2
 7           any2

dtblRight:

 col1   col2      anotherColumn1
 1      Hi        any1
 2      Bye       any1
 3      Later     any1
 4      Never     any1

dtblJoined:

 id   col1  col2     anotherColumn1     anotherColumn2
 1    1     Hi       any1               any2
 2    1     Hi       any1               any2
 3    2     Bye      any1               any2
 4    3     Later    any1               any2
 5    3     Later    any1               any2
 6    3     Later    any1               any2
 7                                      any2

Conditions:

  • In dtblLeft, col1 is not required to have unique values.
  • In dtblRight, col1 has unique values.
  • If dtblLeft is missing a foreign key in col1 or it has one that does not exist in dtblRight then empty or null fields will be inserted.
  • Joining on col1.

I can use regular DataTable operations, LINQ, or whatever.

I tried this but it removes duplicates:

dtblA.PrimaryKey = new DataColumn[] {dtblA.Columns["col1"]}

DataTable dtblJoined = new DataTable();
dtblJoined.Merge(dtblA, false, MissingSchemaAction.AddWithKey);
dtblJoined.Merge(dtblB, false, MissingSchemaAction.AddWithKey);

EDIT 1:

This is close to I what I want but it only has columns from one of the tables ( found at this link ):

    dtblJoined = (from t1 in dtblA.Rows.Cast<DataRow>()
                  join t2 in dtblB.Rows.Cast<DataRow>() on t1["col1"] equals t2["col1"]
                  select t1).CopyToDataTable();

EDIT 2:

An answer from this link seems to work for me but I had to change it a bit as follows:

DataTable targetTable = dtblA.Clone();
var dt2Columns = dtblB.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));
var dt2FinalColumns = from dc in dt2Columns.AsEnumerable()
                   where targetTable.Columns.Contains(dc.ColumnName) == false
                   select dc;

targetTable.Columns.AddRange(dt2FinalColumns.ToArray());

var rowData = from row1 in dtblA.AsEnumerable()
                          join row2 in dtblB.AsEnumerable()
                          on row1["col1"] equals row2["col1"]
                          select row1.ItemArray.Concat(row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)).ToArray();

 foreach (object[] values in rowData)
      targetTable.Rows.Add(values);

I also found this link and I might try that out since it seems more concise.

EDIT 3 (11/18/2013):

Updated tables to reflect more situations.

like image 825
Soenhay Avatar asked Jul 16 '13 18:07

Soenhay


People also ask

What is left outer join?

A left outer join is a method of combining tables. The result includes unmatched rows from only the table that is specified before the LEFT OUTER JOIN clause. If you are joining two tables and want the result set to include unmatched rows from only one table, use a LEFT OUTER JOIN clause or a RIGHT OUTER JOIN clause.

Can we do left join in Linq?

A left outer join is a join in which each element of the first collection is returned, regardless of whether it has any correlated elements in the second collection. You can use LINQ to perform a left outer join by calling the DefaultIfEmpty method on the results of a group join.


2 Answers

Thanks all for your help. Here is what I came up with based on multiple resources:

public static class DataTableHelper
{
    public enum JoinType
    {
        /// <summary>
        /// Same as regular join. Inner join produces only the set of records that match in both Table A and Table B.
        /// </summary>
        Inner = 0,
        /// <summary>
        /// Same as Left Outer join. Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.
        /// </summary>
        Left = 1
    }

    /// <summary>
    /// Joins the passed in DataTables on the colToJoinOn.
    /// <para>Returns an appropriate DataTable with zero rows if the colToJoinOn does not exist in both tables.</para>
    /// </summary>
    /// <param name="dtblLeft"></param>
    /// <param name="dtblRight"></param>
    /// <param name="colToJoinOn"></param>
    /// <param name="joinType"></param>
    /// <returns></returns>
    /// <remarks>
    /// <para>http://stackoverflow.com/questions/2379747/create-combined-datatable-from-two-datatables-joined-with-linq-c-sharp?rq=1</para>
    /// <para>http://msdn.microsoft.com/en-us/library/vstudio/bb397895.aspx</para>
    /// <para>http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html</para>
    /// <para>http://stackoverflow.com/questions/406294/left-join-and-left-outer-join-in-sql-server</para>
    /// </remarks>
    public static DataTable JoinTwoDataTablesOnOneColumn(DataTable dtblLeft, DataTable dtblRight, string colToJoinOn, JoinType joinType)
    {
        //Change column name to a temp name so the LINQ for getting row data will work properly.
        string strTempColName = colToJoinOn + "_2";
        if (dtblRight.Columns.Contains(colToJoinOn))
            dtblRight.Columns[colToJoinOn].ColumnName = strTempColName;

        //Get columns from dtblLeft
        DataTable dtblResult = dtblLeft.Clone();

        //Get columns from dtblRight
        var dt2Columns = dtblRight.Columns.OfType<DataColumn>().Select(dc => new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));

        //Get columns from dtblRight that are not in dtblLeft
        var dt2FinalColumns = from dc in dt2Columns.AsEnumerable()
                              where !dtblResult.Columns.Contains(dc.ColumnName)
                              select dc;

        //Add the rest of the columns to dtblResult
        dtblResult.Columns.AddRange(dt2FinalColumns.ToArray());

        //No reason to continue if the colToJoinOn does not exist in both DataTables.
        if (!dtblLeft.Columns.Contains(colToJoinOn) || (!dtblRight.Columns.Contains(colToJoinOn) && !dtblRight.Columns.Contains(strTempColName)))
        {
            if (!dtblResult.Columns.Contains(colToJoinOn))
                dtblResult.Columns.Add(colToJoinOn);
            return dtblResult;
        }

        switch (joinType)
        {

            default:
            case JoinType.Inner:
                #region Inner
                //get row data
                //To use the DataTable.AsEnumerable() extension method you need to add a reference to the System.Data.DataSetExtension assembly in your project. 
                var rowDataLeftInner = from rowLeft in dtblLeft.AsEnumerable()
                                       join rowRight in dtblRight.AsEnumerable() on rowLeft[colToJoinOn] equals rowRight[strTempColName]
                                       select rowLeft.ItemArray.Concat(rowRight.ItemArray).ToArray();


                //Add row data to dtblResult
                foreach (object[] values in rowDataLeftInner)
                    dtblResult.Rows.Add(values);

                #endregion
                break;
            case JoinType.Left:
                #region Left
                var rowDataLeftOuter = from rowLeft in dtblLeft.AsEnumerable()
                                       join rowRight in dtblRight.AsEnumerable() on rowLeft[colToJoinOn] equals rowRight[strTempColName] into gj
                                       from subRight in gj.DefaultIfEmpty()
                                       select rowLeft.ItemArray.Concat((subRight== null) ? (dtblRight.NewRow().ItemArray) :subRight.ItemArray).ToArray();


                //Add row data to dtblResult
                foreach (object[] values in rowDataLeftOuter)
                    dtblResult.Rows.Add(values);

                #endregion
                break;
        }

        //Change column name back to original
        dtblRight.Columns[strTempColName].ColumnName = colToJoinOn;

        //Remove extra column from result
        dtblResult.Columns.Remove(strTempColName);

        return dtblResult;
    }
}

EDIT 3:

This method now works correctly and it is still fast when the tables have 2000+ rows. Any recommendations/suggestions/improvements would be appreciated.

EDIT 4:

I had a certain scenario that led me to realize the previous version was really doing an inner join. The function has been modified to fix that problem. I used info at this link to figure it out.

like image 108
Soenhay Avatar answered Oct 13 '22 05:10

Soenhay


This is simply an inner join between the 2 tables:

var query = (from x in a.AsEnumerable()
              join y in b.AsEnumerable() on x.Field<int>("col1") equals y.Field<int>("col1")
              select new { col1= y.Field<int>("col1"), col2=x.Field<int>("col2") }).ToList();

Produces:

col1 col2
1    Hi 
1    Hi 
2    Bye 
3    Later 
3    Later 
3    Later 
like image 30
Icarus Avatar answered Oct 13 '22 06:10

Icarus