Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show duplicates in Mathematica

In Mathematica I have a list:

x = {1,2,3,3,4,5,5,6}

How will I make a list with the duplicates? Like:

{3,5}

I have been looking at Lists as Sets, if there is something like Except[] for lists, so I could do:

unique = Union[x]
duplicates = MyExcept[x,unique]

(Of course, if the x would have more than two duplicates - say, {1,2,2,2,3,4,4}, there the output would be {2,2,4}, but additional Union[] would solve this.)

But there wasn't anything like that (if I did understand all the functions there well).

So, how to do that?

like image 736
Martin Janiczek Avatar asked Oct 27 '09 14:10

Martin Janiczek


7 Answers

Here are several faster variations of the Tally method.

f4 uses "tricks" given by Carl Woll and Oliver Ruebenkoenig on MathGroup.

f2 = Tally@# /. {{_, 1} :> Sequence[], {a_, _} :> a} &;

f3 = Pick[#, Unitize[#2 - 1], 1] & @@ Transpose@Tally@# &;

f4 = # ~Extract~ SparseArray[Unitize[#2 - 1]]["NonzeroPositions"] & @@ Transpose@Tally@# &;

Speed comparison (f1 included for reference)

a = RandomInteger[100000, 25000];

f1 = Part[Select[Tally@#, Part[#, 2] > 1 &], All, 1] &;

First@Timing@Do[#@a, {50}] & /@ {f1, f2, f3, f4, Tally}

SameQ @@ (#@a &) /@ {f1, f2, f3, f4}

Out[]= {3.188, 1.296, 0.719, 0.375, 0.36}

Out[]= True

It is amazing to me that f4 has almost no overhead relative to a pure Tally!

like image 136
Mr.Wizard Avatar answered Nov 20 '22 17:11

Mr.Wizard


Lots of ways to do list extraction like this; here's the first thing that came to my mind:

Part[Select[Tally@x, Part[#, 2] > 1 &], All, 1]

Or, more readably in pieces:

Tally@x
Select[%, Part[#, 2] > 1 &]
Part[%, All, 1]

which gives, respectively,

{{1, 1}, {2, 1}, {3, 2}, {4, 1}, {5, 2}, {6, 1}}
{{3, 2}, {5, 2}}
{3, 5}

Perhaps you can think of a more efficient (in time or code space) way :)

By the way, if the list is unsorted then you need run Sort on it first before this will work.

like image 13
Will Robertson Avatar answered Nov 20 '22 19:11

Will Robertson


Here's a way to do it in a single pass through the list:

collectDups[l_] := Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

For example:

collectDups[{1, 1, 6, 1, 3, 4, 4, 5, 4, 4, 2, 2}] --> {1, 1, 4, 4, 4, 2}

If you want the list of unique duplicates -- {1, 4, 2} -- then wrap the above in DeleteDuplicates, which is another single pass through the list (Union is less efficient as it also sorts the result).

collectDups[l_] := 
  DeleteDuplicates@Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

Will Robertson's solution is probably better just because it's more straightforward, but I think if you wanted to eek out more speed, this should win. But if you cared about that, you wouldn't be programming in Mathematica! :)

like image 7
dreeves Avatar answered Nov 20 '22 17:11

dreeves


Using a solution like dreeves, but only returning a single instance of each duplicated element, is a bit on the tricky side. One way of doing it is as follows:

collectDups1[l_] :=
  Module[{i, j},
    i[n_] := (i[n] := j[n]; Unevaluated@Sequence[]);
    j[n_] := (j[n] = Unevaluated@Sequence[]; n);
    i /@ l];

This doesn't precisely match the output produced by Will Robertson's (IMO superior) solution, because elements will appear in the returned list in the order that it can be determined that they're duplicates. I'm not sure if it really can be done in a single pass, all the ways I can think of involve, in effect, at least two passes, although one might only be over the duplicated elements.

like image 4
Pillsy Avatar answered Nov 20 '22 19:11

Pillsy


Here is a version of Robertson's answer that uses 100% "postfix notation" for function calls.

identifyDuplicates[list_List, test_:SameQ] :=
 list //
    Tally[#, test] & //
   Select[#, #[[2]] > 1 &] & //
  Map[#[[1]] &, #] &

Mathematica's // is similar to the dot for method calls in other languages. For instance, if this were written in C# / LINQ style, it would resemble

list.Tally(test).Where(x => x[2] > 1).Select(x => x[1])

Note that C#'s Where is like MMA's Select, and C#'s Select is like MMA's Map.

EDIT: added optional test function argument, defaulting to SameQ.

EDIT: here is a version that addresses my comment below & reports all the equivalents in a group given a projector function that produces a value such that elements of the list are considered equivalent if the value is equal. This essentially finds equivalence classes longer than a given size:

reportDuplicateClusters[list_List, projector_: (# &), 
  minimumClusterSize_: 2] :=
 GatherBy[list, projector] //
  Select[#, Length@# >= minimumClusterSize &] &

Here is a sample that checks pairs of integers on their first elements, considering two pairs equivalent if their first elements are equal

reportDuplicateClusters[RandomInteger[10, {10, 2}], #[[1]] &]
like image 2
Reb.Cabin Avatar answered Nov 20 '22 18:11

Reb.Cabin


This thread seems old, but I've had to solve this myself.

This is kind of crude, but does this do it?

Union[Select[Table[If[tt[[n]] == tt[[n + 1]], tt[[n]], ""], {n, Length[tt] - 1}], IntegerQ]]
like image 2
magrew Avatar answered Nov 20 '22 19:11

magrew


Given a list A,
get the non-duplicate values in B
B = DeleteDuplicates[A]
get the duplicate values in C
C = Complement[A,B]
get the non-duplicate values from the duplicate list in D
D = DeleteDuplicates[C]

So for your example:
A = 1, 2, 2, 2, 3, 4, 4
B = 1, 2, 3, 4
C = 2, 2, 4
D = 2, 4

so your answer would be DeleteDuplicates[Complement[x,DeleteDuplicates[x]]] where x is your list. I don't know mathematica, so the syntax may or may not be perfect here. Just going by the docs on the page you linked to.

like image 1
Brian Schroth Avatar answered Nov 20 '22 19:11

Brian Schroth