I was asked in an interview to write a function for finding all pairs of ints in an array that add up to N. My answer was kinda bulky:
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arrhash )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Tuple<int,int> (i,j));
}
return result;
}
I'm a beginner to C#, come from a C++ background, and I have a few questions about how to make this better:
Is it innefficient to iterate through a HashSet? In other words, would my procedure be more efficient (although less compact) if I changed it to
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> ();
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arr )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Type<int,int> (i,j));
arrHash.Add(i);
}
return result;
}
?????
I realize that Add is more like an "Add if not already in there", so I have a useless operation whenever I run result.Add(new Tuple<int,int> (i,j)) for an i,j pair that is already in the set. The more repeated pairs in the array, the more useless operations, and there's all the overhead of allocating the new Tuple that may never be used. Is there a way to optimize this by checking whether the pair i,j is a Tuple in the set before creating a new Tuple out of said pair and trying to add it?
Speaking of the above allocation of a new Tuple on the heap, do I need to free this memory if I don't end up adding that Tuple to the result? Potential memory leak here?
There has to be some way of combining the two sets
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
In a sense, they contain redundant information since every int in the second one is also in the first one. Something feels "wrong" about having to sets here, yet I can't think of a better way to do it.
Better yet, does the .NET library have any way of doing a 1-line solution for the problem? ;)
Paging Dr. Skeet.
This is what I would try
public Dictionary<int, int> Pairs(int[] arr, int N)
{
// int N asssumes no arr > int32 max / 2
int len = arr.Length < N ? arr.Length / 2 : N / 2;
Dictionary<int, int> d = new Dictionary<int, int>(len);
// add is O(1) if count <= capacity
if(arr.Length == 0) return d;
Array.Sort(arr); // so it is O(n log n) I still take my chances with it
// that is n * log(n)
int start = 0;
int end = arr.Length - 1;
do
{
int ttl = arr[start] + arr[end];
if (ttl == N)
{
if(!d.ContainsKey(arr[start]))
d.Add(arr[start], arr[end]);
// if start <= end then pair uniquely defined by either
// and a perfect hash (zero collisions)
start++;
end--;
}
else if (ttl > N)
end--;
else
start++;
if(start >= end)
return d;
} while (true);
}
Even with a HashSet based solution still use Dictionary(N/2) with Key <= Value
Or use Dictionary(arr.Length / 2)
If you need a neat solution for your problem, here it is, implemented with LINQ.
The performance however, is 4 times worse than your second solution.
Since you have asked for a one liner, here it is anyway.
NOTE: I would appreciate any improvements especially to get rid of that Distinct() since it takes the 50% of the overall cpu time
static List<Pair> PairsThatSumToN(int[] arr, int N)
{
return
(
from x in arr join y in arr on N - x equals y select new Pair(x, y)
)
.Distinct()
.ToList();
}
public class Pair : Tuple<int, int>
{
public Pair(int item1, int item2) : base(item1, item2) { }
public override bool Equals(object pair)
{
Pair dest = pair as Pair;
return dest.Item1 == Item1 || dest.Item2 == Item1;
}
public override int GetHashCode()
{
return Item1 + Item2;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With