I have a class called Customer
that has several string properties like
firstName, lastName, email, etc.
I read in the customer information from a csv
file that creates an array of the class:
Customer[] customers
I need to remove the duplicate customers having the same email address, leaving only 1 customer record for each particular email address.
I have done this using 2 loops but it takes nearly 5 minutes as there are usually 50,000+ customer records. Once I am done removing the duplicates, I need to write the customer information to another csv file (no help needed here).
If I did a Distinct
in a loop how would I remove the other string variables that are a part of the class for that particular customer as well?
Thanks, Andrew
Array. filter() removes all duplicate objects by checking if the previously mapped id-array includes the current id ( {id} destructs the object into only its id). To only filter out actual duplicates, it is using Array.
To remove duplicates from an array: First, convert an array of duplicates to a Set . The new Set will implicitly remove duplicate elements. Then, convert the set back to an array.
With Linq
, you can do this in O(n) time (single level loop) with a GroupBy
var uniquePersons = persons.GroupBy(p => p.Email)
.Select(grp => grp.First())
.ToArray();
Update
A bit on O(n)
behavior of GroupBy
.
GroupBy
is implemented in Linq
(Enumerable.cs
) as this -
The IEnumerable
is iterated only once to create the grouping. A Hash
of the key provided (e.g. "Email" here) is used to find unique keys, and the elements are added in the Grouping
corresponding to the keys.
Please see this GetGrouping code. And some old posts for reference.
Then Select
is obviously an O(n) code, making the above code O(n)
overall.
Update 2
To handle empty
/null
values.
So, if there are instances where the value of Email
is null
or empty
, the simple GroupBy
will take just one of those objects from null
& empty
each.
One quick way to include all those objects with null
/empty
value is to use some unique keys at the run time for those objects, like
var tempEmailIndex = 0;
var uniqueNullAndEmpty = persons
.GroupBy(p => string.IsNullOrEmpty(p.Email)
? (++tempEmailIndex).ToString() : p.Email)
.Select(grp => grp.First())
.ToArray();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With