I tried to solve this using a functional way, but I am not having much success.
Suppose there is a list of lists, and it is required to pick only the lists in it which have unique entry in specific position.
For example, suppose there is a matrix, and we want to select only the rows which have unique elements in the first column.
Here is an example:
INPUT:
list= {{ 1,2}, {1,3},{4,5}}
I'd like the output to be
list={{1,2},{4,5}}
It does not matter which 'row' to remove, the first one is ok, but any one is fine.
I tried Select, DeleteCases, DeleteDuplicates, Union, and few other things, but can't get it to work. I do not know how to tell Mathematica to only look for 'unique' element. Union comes close but it works on complete list. i.e. I do not know what to write for criteria, as in
DeleteDuplicates[list, <now what?> ]
For reference, this is how I do the above in Matlab:
EDU>> A=[1 2;1 3;4 5]
A =
1 2
1 3
4 5
EDU>> [B,I,J]=unique(A(:,1));
EDU>> A(I,:)
ans =
1 3
4 5
thanks
Here is one way:
DeleteDuplicates[list, First@#1 === First@#2 &]
EDIT
Note that the timings and discussion below are based on M7
Upon reflecting a bit, I found a solution which will be (at least) order of magnitude faster for large lists, and sometimes two orders of magnitude faster, for this particular case (probably, a better way to put it is that the solution below will have different computational complexity):
Clear[delDupBy];
delDupBy[nested_List, n_Integer] :=
Module[{parts = nested[[All, n]], ord, unpos},
ord = Ordering[parts];
unpos = Most@Accumulate@Prepend[Map[Length, Split@parts[[ord]]], 1];
nested[[Sort@ord[[unpos]]]]];
Benchmarks:
In[406]:=
largeList = RandomInteger[{1,15},{50000,2}];
In[407]:= delDupBy[largeList,1]//Timing
Out[407]= {0.016,{{13,4},{12,1},{1,6},{6,13},{10,12},{7,15},{8,14},
{14,4},{4,1},{11,9},{5,11},{15,4},{2,7},{3,2},{9,12}}}
In[408]:= DeleteDuplicates[largeList,First@#1===First@#2&]//Timing
Out[408]= {1.265,{{13,4},{12,1},{1,6},{6,13},{10,12},{7,15},{8,14},{14,4},
{4,1},{11,9},{5,11},{15,4},{2,7},{3,2},{9,12}}}
This is particularly remarkable because DeleteDuplicates
is a built-in function. I can make a blind guess that DeleteDuplicates
with user-defined test is using a quadratic-time pairwise comparison algorithm, while delDupBy
is n*log n
in the size of the list.
I think this is an important lesson: one should pay attention to built-in functions such as Union
, Sort
, DeleteDuplicates
etc when using custom tests. I discussed it more extensively in this Mathgroup thread, where there are also other insightful replies.
Finally, let me mention that exactly this question has been asked (with the emphasis on efficiency) before here. I will reproduce here a solution I gave for the case when the first (or, generally, n
-th) elements are positive integers (generalizing to arbitrary integers is straightforward).:
Clear[sparseArrayElements];
sparseArrayElements[HoldPattern[SparseArray[u___]]] := {u}[[4, 3]]
Clear[deleteDuplicatesBy];
Options[deleteDuplicatesBy] = {Ordered -> True, Threshold -> 1000000};
deleteDuplicatesBy[data_List, n_Integer, opts___?OptionQ] :=
Module[{fdata = data[[All, n]], parr,
rlen = Range[Length[data], 1, -1],
preserveOrder = Ordered /. Flatten[{opts}] /. Options[deleteDuplicatesBy],
threshold = Threshold /. Flatten[{opts}] /. Options[deleteDuplicatesBy], dim},
dim = Max[fdata];
parr = If[dim < threshold, Table[0, {dim}], SparseArray[{}, dim, 0]];
parr[[fdata[[rlen]]]] = rlen;
parr = sparseArrayElements@If[dim < threshold, SparseArray@parr, parr];
data[[If[preserveOrder, Sort@parr, parr]]]
];
The way this works is to use the first (or, generally, n
-th) elements as positions in some
huge table we preallocate, exploiting that they are positive integers). This one can give us crazy performance in some cases. Observe:
In[423]:= hugeList = RandomInteger[{1,1000},{500000,2}];
In[424]:= delDupBy[hugeList,1]//Short//Timing
Out[424]= {0.219,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}}
In[430]:= deleteDuplicatesBy[hugeList,1]//Short//Timing
Out[430]= {0.032,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With