Simple situation. I have a list of lists, almost table like, and I am trying to find out if any of the lists are duplicated.
Example:
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
I would like to know that there are 4 total items, 2 of which are duplicates. I was thinking about doing something like a SQL checksum but I didn't know if there was a better/easier way.
I care about performance, and I care about ordering.
Additional Information That May Help
Let's try to get best performace. if n is number of lists and m is length of lists then we can get O(nm + nlogn + n) plus some probability of hash codes to be equal for different lists.
Major steps:
* this is important step. for simlicity you can calc hash as = ... ^ (list[i] << i) ^ (list[i + 1] << (i + 1))
Edit for those people that think that PLINQ can boost the thing, but not good algorythm. PLINQ can also be added here, because all the steps are easily parallelizable.
My code:
static public void Main()
{
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
var hashList = list.Select((l, ind) =>
{
uint hash = 0;
for (int i = 0; i < l.Count; i++)
{
uint el = (uint)l[i];
hash ^= (el << i) | (el >> (32 - i));
}
return new {hash, ind};
}).OrderBy(l => l.hash).ToList();
//hashList.Sort();
uint prevHash = hashList[0].hash;
int firstInd = 0;
for (int i = 1; i <= hashList.Count; i++)
{
if (i == hashList.Count || hashList[i].hash != prevHash)
{
for (int n = firstInd; n < i; n++)
for (int m = n + 1; m < i; m++)
{
List<int> x = list[hashList[n].ind];
List<int> y = list[hashList[m].ind];
if (x.Count == y.Count && x.SequenceEqual(y))
Console.WriteLine("Dupes: {0} and {1}", hashList[n].ind, hashList[m].ind);
}
}
if (i == hashList.Count)
break;
if (hashList[i].hash != prevHash)
{
firstInd = i;
prevHash = hashList[i].hash;
}
}
}
Unless you're doing some seriously heavy lifting, perhaps the following simple code will work for you:
var lists = new List<List<int>>()
{
new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
new List<int>() {0 ,1, 4, 2, 4, 5, 6 },
new List<int>() {0 ,3, 2, 5, 1, 6, 4 }
};
var duplicates = from list in lists
where lists.Except(new[] { list }).Any(l => l.SequenceEqual(list))
select list;
Obviously you could get better performance if you hand-tweak an algorithm such that you don't have to scan the lists each iteration, but there is something to be said for writing declarative, simpler code.
(Plus, thanks to the Awesomeness of LINQ®, by adding a .AsParallel() call to the above code, the algorithm will run on multiple cores, thus running potentially faster than the complex, hand-tweaked solutions mentioned in this thread.)
Something like this will give you the correct results:
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
list.ToLookup(l => String.Join(",", l.Select(i => i.ToString()).ToArray()))
.Where(lk => lk.Count() > 1)
.SelectMany(group => group);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With