Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the essential functions to find duplicate elements within a list?

Tags:

f#

What are the essential functions to find duplicate elements within a list?

Translated, how can I simplify the following function:

let numbers = [ 3;5;5;8;9;9;9 ]

let getDuplicates = numbers |> List.groupBy id
                            |> List.map snd
                            |> List.filter (fun set -> set.Length > 1)
                            |> List.map (fun set -> set.[0])

I'm sure this is a duplicate. However, I am unable to locate the question on this site.

UPDATE

let getDuplicates numbers =

    numbers |> List.groupBy id
            |> List.choose (fun (k,v) -> match v.Length with
                                         | x when x > 1 -> Some k
                                         | _            -> None)
like image 585
Scott Nimrod Avatar asked Apr 06 '16 15:04

Scott Nimrod


People also ask

Which function is use to count the duplicate values of a list?

In Excel, you can use COUNTIF function to count the duplicates.


2 Answers

Simplifying your function:

Whenever you have a filter followed by a map, you can probably replace the pair with a choose. The purpose of choose is to run a function for each value in the list, and return only the items which return Some value (None values are removed, which is the filter portion). Whatever value you put inside Some is the map portion:

let getDuplicates = numbers |> List.groupBy id
                            |> List.map snd
                            |> List.choose( fun( set ) ->
                               if set.Length > 1
                                  then Some( set.[0] )
                                  else None )

We can take it one additional step by removing the map. In this case, keeping the tuple which contains the key is helpful, because it eliminates the need to get the first item of the list:

let getDuplicates = numbers |> List.groupBy id
                            |> List.choose( fun( key, set ) ->
                               if set.Length > 1
                                  then Some key
                                  else None )

Is this simpler than the original? Perhaps. Because choose combines two purposes, it is by necessity more complex than those purposes kept separate (the filter and the map), and this makes it harder to understand at a glance, perhaps undoing the more "simplified" code. More on this later.

Decomposing the concept

Simplifying the code wasn't the direct question, though. You asked about functions useful in finding duplicates. At a high level, how do you find a duplicate? It depends on your algorithm and specific needs:

  • Your given algorithm uses the "put items in buckets based on their value", and "look for buckets with more than one item". This is a direct match to List.groupBy and List.choose (or filter/map)
  • A different algorithm could be to "iterate through all items", "modify an accumulator as we see each", then "report all items which have been seen multiple times". This is kind of like the first algorithm, where something like List.fold is replacing List.groupBy, but if you need to drag some other kind of state along, it may be helpful.
  • Perhaps you need to know how many times there are duplicates. A different algorithm satisfying these requirements may be "sort the items so they are always ascending", and "flag if the next item is the same as the current item". In this case, you have a List.sort followed by a List.toSeq then Seq.windowed:

    let getDuplicates = numbers |> List.sort
                                |> List.toSeq
                                |> Seq.windowed 2
                                |> Seq.choose( function
                                  | [|x; y|] when x = y -> Some x
                                  | _ -> None )
    

    Note that this returns a sequence with [5; 9; 9], informing you that 9 is duplicated twice.

  • These were algorithms mostly based on List functions. There are already two answers, one mutable, the other not, which are based on sets and existence.

My point is, a complete list of functions helpful to finding duplicates would read like a who's who list of existing collection functions -- it all depends on what you're trying to do and your specific requirements. I think your choice of List.groupBy and List.choose is probably about as simple as it gets.

Simplifying for maintainability

The last thought on simplification is to remember that simplifying code will improve the readability of your code to a certain extent. "Simplifying" beyond that point will most likely involve tricks, or obscure intent. If I were to look back at a sample of code I wrote even several weeks and a couple of projects ago, the shortest and perhaps simplest code would probably not be the easiest to understand. Thus the last point -- simplifying future code maintainability may be your goal. If this is the case, your original algorithm modified only keeping the groupBy tuple and adding comments as to what each step of the pipeline is doing may be your best bet:

// combine numbers into common buckets specified by the number itself
let getDuplicates = numbers |> List.groupBy id
                            // only look at buckets with more than one item
                            |> List.filter( fun (_,set) -> set.Length > 1)
                            // change each bucket to only its key
                            |> List.map( fun (key,_) -> key )

The original question comments already show that your code was unclear to people unfamiliar with it. Is this a question of experience? Definitely. But, regardless of whether we work on a team, or are lone wolves, optimizing code (where possible) for quick understanding should probably be close to everyone's top priority. (climbing down off sandbox...) :)

Regardless, best of luck.

like image 82
chryosolo Avatar answered Sep 27 '22 20:09

chryosolo


If you don't mind using a mutable collection in a local scope, this could do it:

open System.Collections.Generic

let getDuplicates numbers =
    let known = HashSet()
    numbers |> List.filter (known.Add >> not) |> set
like image 32
Vandroiy Avatar answered Sep 27 '22 19:09

Vandroiy