Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ Join on a Nullable key

Tags:

c#

linq

The LINQ Join() method with Nullable<int> for TKey skips over null key matches. What am I missing in the documentation? I know that I can switch to SelectMany(), I'm just curious why this equality operation works like SQL and not like C# since as near as I can tell, the EqualityComparer<int?>.Default works exactly like I would expect it to for null values.

http://msdn.microsoft.com/en-us/library/bb534675.aspx

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;

public class dt
{
   public int? Id;
   public string Data;
}

public class JoinTest
{
    public static int Main(string [] args)
    {
        var a = new List<dt>
        {
            new dt { Id = null, Data = "null" },
            new dt { Id = 1, Data = "1" },
            new dt { Id = 2, Data = "2" }
        };

        var b = new List<dt>
        {
            new dt { Id = null, Data = "NULL" },
            new dt { Id = 2, Data = "two" },
            new dt { Id = 3, Data = "three" }
        };

        //Join with null elements
        var c = a.Join( b,
            dtA => dtA.Id,
            dtB => dtB.Id,
            (dtA, dtB) => new { aData = dtA.Data, bData = dtB.Data } ).ToList();
        // Output:
        // 2 two
        foreach ( var aC in c )
            Console.WriteLine( aC.aData + " " + aC.bData );
        Console.WriteLine( " " );

        //Join with null elements converted to zero
        c = a.Join( b,
            dtA => dtA.Id.GetValueOrDefault(),
            dtB => dtB.Id.GetValueOrDefault(),
            (dtA, dtB) => new { aData = dtA.Data, bData = dtB.Data } ).ToList();

        // Output:
        // null NULL
        // 2 two
        foreach ( var aC in c )
            Console.WriteLine( aC.aData + " " + aC.bData );

        Console.WriteLine( EqualityComparer<int?>.Default.Equals( a[0].Id, b[0].Id ) );
        Console.WriteLine( EqualityComparer<object>.Default.Equals( a[0].Id, b[0].Id ) );
        Console.WriteLine( a[0].Id.Equals( b[0].Id ) );

        return 0;
    }
}
like image 743
ryancerium Avatar asked Jun 13 '13 16:06

ryancerium


1 Answers

Enumerable.Join uses JoinIterator (private class) to iterate over matching elements. JoinIterator uses Lookup<TKey, TElement> for creating lookups of sequence keys:

internal static Lookup<TKey, TElement> CreateForJoin(
    IEnumerable<TElement> source, 
    Func<TElement, TKey> keySelector, 
    IEqualityComparer<TKey> comparer)
{
    Lookup<TKey, TElement> lookup = new Lookup<TKey, TElement>(comparer);
    foreach (TElement local in source)
    {
        TKey key = keySelector(local);
        if (key != null) // <--- Here
        {
            lookup.GetGrouping(key, true).Add(local);
        }
    }
    return lookup;
}

Interesting part here is skipping keys which are null. That's why without providing default value you have only one match.


Looks like I found the reason of such behavior. Lookup uses default EqualityComparer, which will return 0 both for key which is null and key which is 0:

int? keyA = 0;
var comparer = EqualityComparer<int?>.Default;
int hashA = comparer.GetHashCode(keyA) & 0x7fffffff; // from Lookup class
int? keyB = null;
int hashB = comparer.GetHashCode(keyB) & 0x7fffffff;
Console.WriteLine(hashA); // 0
Console.WriteLine(hashB); // 0

Possibly nulls skipped to avoid matching null and 0 keys.

like image 103
Sergey Berezovskiy Avatar answered Sep 23 '22 20:09

Sergey Berezovskiy