Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query a list for only duplicates

Tags:

c#

list

linq

I have a List of type string in a .NET 3.5 project. The list has thousands of strings in it, but for the sake of brevity we're going to say that it just has 5 strings in it.

List<string> lstStr = new List<string>() {
            "Apple", "Banana", "Coconut", "Coconut", "Orange"};

Assume that the list is sorted (as you can tell above). What I need is a LINQ query that will remove all strings that are not duplicates. So the result would leave me with a list that only contains the two "Coconut" strings.

Is this possible to do with a LINQ query? If it is not then I'll have to resort to some complex for loops, which I can do, but I didn't want to unless I had to.

like image 906
Jagd Avatar asked Aug 12 '10 18:08

Jagd


People also ask

How do I select only duplicates in SQL?

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.

How do I query duplicate records in SQL?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.


2 Answers

here is code for finding duplicates form string arrya

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);
foreach (var d in duplicates)
    Console.WriteLine(d);
like image 60
Pranay Rana Avatar answered Oct 25 '22 05:10

Pranay Rana


var dupes = lstStr.Where(x => lstStr.Sum(y => y==x ? 1 : 0) > 1);

OR

var dupes = lstStr.Where((x,i) => (   (i > 0 && x==lstStr[i-1]) 
                                   || (i < lstStr.Count-1 && x==lstStr[i+1]));

Note that the first one enumerates the list for every element which takes O(n²) time (but doesn't assume a sorted list). The second one is O(n) (and assumes a sorted list).

like image 28
Mark Cidade Avatar answered Oct 25 '22 05:10

Mark Cidade