I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to use the most frequent value along the 0th axis at each location, which is to construct a new array with shape of (1,480,640).ie: <pre class="prettyprint"><code>>>> data array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[40, 40, 42, 43, 44], [45, 46, 47, 48, 49], [50, 51, 52, 53, 54], [55, 56, 57, 58, 59]]]) (perform calculation) >>> new_data array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]]) </code></pre> The data points will contain negtive and positive floating numbers. How can I perform such calculations? Thanks a lot! I tried with numpy.unique,but I got "TypeError: unique() got an unexpected keyword argument 'return_inverse'".I'm using numpy version 1.2.1 installed on Unix and it doesn't support return_inverse..I also tried mode,but it takes forever to process such large amount of data...so is there an alternative way to get the most frequent values? Thanks again.

To find the most frequent value of a flat array, use <code>unique</code>, <code>bincount</code> and <code>argmax</code>: <pre class="prettyprint"><code>arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1]) u, indices = np.unique(arr, return_inverse=True) u[np.argmax(np.bincount(indices))] </code></pre> To work with a multidimensional array, we don't need to worry about <code>unique</code>, but we do need to use <code>apply_along_axis</code> on <code>bincount</code>: <pre class="prettyprint"><code>arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1], [0, 1, 2, 2, 3, 4, 5, 6, 7, 8]]) axis = 1 u, indices = np.unique(arr, return_inverse=True) u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape), None, np.max(indices) + 1), axis=axis)] </code></pre> With your data: <pre class="prettyprint"><code>data = np.array([ [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[40, 40, 42, 43, 44], [45, 46, 47, 48, 49], [50, 51, 52, 53, 54], [55, 56, 57, 58, 59]]]) axis = 0 u, indices = np.unique(arr, return_inverse=True) u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape), None, np.max(indices) + 1), axis=axis)] array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]) </code></pre> <hr> NumPy 1.2, really? You can approximate <code>np.unique(return_inverse=True)</code> reasonably efficiently using <code>np.searchsorted</code> (it's an additional O(n log n), so shouldn't change the performance significantly): <pre class="prettyprint"><code>u = np.unique(arr) indices = np.searchsorted(u, arr.flat) </code></pre>

How to find most frequent values in numpy ndarray?

Tags:

I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to use the most frequent value along the 0th axis at each location, which is to construct a new array with shape of (1,480,640).ie:

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])

The data points will contain negtive and positive floating numbers. How can I perform such calculations? Thanks a lot!

I tried with numpy.unique,but I got "TypeError: unique() got an unexpected keyword argument 'return_inverse'".I'm using numpy version 1.2.1 installed on Unix and it doesn't support return_inverse..I also tried mode,but it takes forever to process such large amount of data...so is there an alternative way to get the most frequent values? Thanks again.

472

asked Sep 06 '12 09:09

oops

1 Answers

To find the most frequent value of a flat array, use unique, bincount and argmax:

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]

To work with a multidimensional array, we don't need to worry about unique, but we do need to use apply_along_axis on bincount:

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

With your data:

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

NumPy 1.2, really? You can approximate np.unique(return_inverse=True) reasonably efficiently using np.searchsorted (it's an additional O(n log n), so shouldn't change the performance significantly):

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)

answered Oct 28 '22 01:10

ecatmur

Related questions
                            
                                batch file multiple actions under a if condition
                            
                                How do you debug a Node.js server running with Chrome/WebKit as the remote debugger?
                            
                                onClick Function "this" Returns Window Object
                            
                                Typescript in VS2012: (automatically) generating .js from .ts
                            
                                Any way to obtain a Java class from a Scala (2.10) type tag or symbol?
                            
                                What is "_csv" in Python?
                            
                                How do I know which linux user Wordpress uses for plugin installation
                            
                                How to pass array of arguments to Powershell commandline
                            
                                Is there an equivalent of C# indexer in Java?
                            
                                How to execute JavaExec multiple times in a single task using Gradle?
                            
                                Can I write JSON data to a file in iOS/Objective-C?
                            
                                Download file progressively using TIdHttp

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With