Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find most frequent values in numpy ndarray?

Tags:

I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to use the most frequent value along the 0th axis at each location, which is to construct a new array with shape of (1,480,640).ie:

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])

The data points will contain negtive and positive floating numbers. How can I perform such calculations? Thanks a lot!

I tried with numpy.unique,but I got "TypeError: unique() got an unexpected keyword argument 'return_inverse'".I'm using numpy version 1.2.1 installed on Unix and it doesn't support return_inverse..I also tried mode,but it takes forever to process such large amount of data...so is there an alternative way to get the most frequent values? Thanks again.

like image 472
oops Avatar asked Sep 06 '12 09:09

oops


People also ask

How do I find the most frequent value in a list Python?

Make use of Python Counter which returns count of each element in the list. Thus, we simply find the most common element by using most_common() method.

Where can I find unique values in Ndarray?

With the help of np. unique() method, we can get the unique values from an array given as parameter in np. unique() method. Return : Return the unique of an array.


1 Answers

To find the most frequent value of a flat array, use unique, bincount and argmax:

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]

To work with a multidimensional array, we don't need to worry about unique, but we do need to use apply_along_axis on bincount:

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

With your data:

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

NumPy 1.2, really? You can approximate np.unique(return_inverse=True) reasonably efficiently using np.searchsorted (it's an additional O(n log n), so shouldn't change the performance significantly):

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)
like image 78
ecatmur Avatar answered Oct 28 '22 01:10

ecatmur