Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I decrease the amount of memory and operations involved in the ""Find all the pairs that add to N" interview challenge?

I was asked in an interview to write a function for finding all pairs of ints in an array that add up to N. My answer was kinda bulky:

HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
    HashSet<int> arrhash = new HashShet<int> (arr);
    HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();    
    for ( int i in arrhash ) 
    {
       int j = N - i;
       if ( arrhash.Contains(j) ) result.Add(new Tuple<int,int> (i,j));
    }
    return result;
}

I'm a beginner to C#, come from a C++ background, and I have a few questions about how to make this better:

  • Is it innefficient to iterate through a HashSet? In other words, would my procedure be more efficient (although less compact) if I changed it to

    HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
    {
       HashSet<int> arrhash = new HashShet<int> ();
       HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();    
       for ( int i in arr ) 
       {
          int j = N - i;
          if ( arrhash.Contains(j) ) result.Add(new Type<int,int> (i,j));
          arrHash.Add(i);
       }
       return result;
    }
    

    ?????

  • I realize that Add is more like an "Add if not already in there", so I have a useless operation whenever I run result.Add(new Tuple<int,int> (i,j)) for an i,j pair that is already in the set. The more repeated pairs in the array, the more useless operations, and there's all the overhead of allocating the new Tuple that may never be used. Is there a way to optimize this by checking whether the pair i,j is a Tuple in the set before creating a new Tuple out of said pair and trying to add it?

  • Speaking of the above allocation of a new Tuple on the heap, do I need to free this memory if I don't end up adding that Tuple to the result? Potential memory leak here?

  • There has to be some way of combining the two sets

    HashSet<int> arrhash = new HashShet<int> (arr);
    HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>(); 
    

    In a sense, they contain redundant information since every int in the second one is also in the first one. Something feels "wrong" about having to sets here, yet I can't think of a better way to do it.

  • Better yet, does the .NET library have any way of doing a 1-line solution for the problem? ;)

Paging Dr. Skeet.

like image 874
user5572578 Avatar asked Dec 01 '25 03:12

user5572578


2 Answers

This is what I would try

    public Dictionary<int, int> Pairs(int[] arr, int N)
    {
        // int N asssumes no arr > int32 max / 2 
        int len = arr.Length < N ? arr.Length / 2 : N / 2;
        Dictionary<int, int> d = new Dictionary<int, int>(len); 
                          // add is O(1) if count <= capacity 
        if(arr.Length == 0) return d;
        Array.Sort(arr);  // so it is O(n log n) I still take my chances with it
                          // that is n * log(n)         
        int start = 0;
        int end = arr.Length - 1;
        do
        {
            int ttl = arr[start] + arr[end];
            if (ttl == N)
            {
                if(!d.ContainsKey(arr[start]))
                      d.Add(arr[start], arr[end]); 
                      // if start <= end then pair uniquely defined by either 
                      // and a perfect hash (zero collisions)
                start++;
                end--;
            }
            else if (ttl > N)
                end--;
            else
                start++;
            if(start >= end)
                return d;
        }   while (true);
    }

Even with a HashSet based solution still use Dictionary(N/2) with Key <= Value

Or use Dictionary(arr.Length / 2)

like image 88
paparazzo Avatar answered Dec 02 '25 15:12

paparazzo


If you need a neat solution for your problem, here it is, implemented with LINQ.

The performance however, is 4 times worse than your second solution.

Since you have asked for a one liner, here it is anyway.

NOTE: I would appreciate any improvements especially to get rid of that Distinct() since it takes the 50% of the overall cpu time

static List<Pair> PairsThatSumToN(int[] arr, int N)
{
    return
    (
        from x in arr join y in arr on N - x equals y select new Pair(x, y)
    )
    .Distinct()
    .ToList();
}

public class Pair : Tuple<int, int>
{
    public Pair(int item1, int item2) : base(item1, item2) { }

    public override bool Equals(object pair)
    {
        Pair dest = pair as Pair;
        return dest.Item1 == Item1 || dest.Item2 == Item1;
    }

    public override int GetHashCode()
    {
        return Item1 + Item2;
    }
}
like image 22
Oguz Ozgul Avatar answered Dec 02 '25 15:12

Oguz Ozgul