Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use the unique(a, 'rows') from MATLAB in Python?

I'm translating some stuff from MATLAB to the Python language.

There's this command, unique(a), in NumPy. But since the MATLAB program runs the 'rows' command also, it gives something a little different.

Is there a similar command in Python or should I make some algorithm that does the same thing?

like image 988
Rodrigo Forti Avatar asked Jul 21 '11 13:07

Rodrigo Forti


People also ask

How do I find unique rows in Python?

drop_duplicates(df) to select only unique rows from pandas. DataFrame . To select unique rows over certain columns, use DataFrame. drop_duplicate(subset = None) with subset assigned to a list of columns to get unique rows over these columns.

How do I find unique rows in Matlab?

C = unique( A , setOrder ) returns the unique values of A in a specific order. setOrder can be 'sorted' (default) or 'stable' . C = unique( A , occurrence ) specifies which indices to return in case of repeated values. occurrence can be 'first' (default) or 'last' .

How do I find unique rows in NumPy?

To find unique rows in a NumPy array we are using numpy. unique() function of NumPy library.


2 Answers

Assuming your 2D array is stored in the usual C order (that is, each row is counted as an array or list within the main array; in other words, row-major order), or that you transpose the array beforehand otherwise, you could do something like...

>>> import numpy as np
>>> a = np.array([[1, 2, 3], [2, 3, 4], [1, 2, 3], [3, 4, 5]])
>>> a
array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 4, 5]])
>>> np.array([np.array(x) for x in set(tuple(x) for x in a)]) # or "list(x) for x in set[...]"
array([[3, 4, 5],
       [2, 3, 4],
       [1, 2, 3]])

Of course, this doesn't really work if you need the unique rows in their original order.


By the way, to emulate something like unique(a, 'columns'), you'd just transpose the original array, do the step shown above, and then transpose back.

like image 178
JAB Avatar answered Nov 15 '22 21:11

JAB


You can try:

ii = 0; wrk_arr = your_arr
idx = numpy.arange(0,len(wrk_arr))
while ii<=len(wrk_arr)-1:
    i_list = numpy.arange(0,len(wrk_arr)
    candidate = numpy.matrix(wrk_arr[ii,:])
    i_dup = numpy.array([0] * len(wrk_arr))
    numpy.all(candidate == wrk_arr,axis=1, iout = idup)
    idup[ii]=0
    i_list = numpy.unique(i_list * (1-idup))
    idx = numpy.unique(idx * (1-idup))
    wrk_arr = wrk_arr[i_list,:]
    ii += 1

The results are wrk_arr which is the unique sorted array of your_arr. The relation is:

your_arr[idx,:] = wrk_arr

It works like MATLAB in the sense that the returned array (wrk_arr) keeps the order of the original array (your_arr). The idx array differs from MATLAB since it contains the indices of first appearance whereas MATLAB returns the LAST appearance.

From my experience it worked as fast as MATLAB on a 10000 X 4 matrix.

And a transpose will do the trick for the column case.

like image 42
David New2Py Avatar answered Nov 15 '22 21:11

David New2Py