Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy array loss of dimension when masking

Tags:

I want to select certain elements of an array and perform a weighted average calculation based on the values. However, using a filter condition, destroys the original structure of the array. arr which was of shape (2, 2, 3, 2) is turned into a 1-dimensional array. This is of no use to me, as not all these elements need to be combined later on with each other (but subarrays of them). How can I avoid this flattening?

>>> arr = np.asarray([ [[[1, 11], [2, 22], [3, 33]], [[4, 44], [5, 55], [6, 66]]], [ [[7, 77], [8, 88], [9, 99]], [[0, 32], [1, 33], [2, 34] ]] ]) >>> arr array([[[[ 1, 11],          [ 2, 22],          [ 3, 33]],          [[ 4, 44],          [ 5, 55],          [ 6, 66]]],          [[[ 7, 77],          [ 8, 88],          [ 9, 99]],          [[ 0, 32],          [ 1, 33],          [ 2, 34]]]]) >>> arr.shape (2, 2, 3, 2) >>> arr[arr>3] array([11, 22, 33,  4, 44,  5, 55,  6, 66,  7, 77,  8, 88,  9, 99, 32, 33,        34]) >>> arr[arr>3].shape (18,) 
like image 294
orange Avatar asked Mar 14 '15 06:03

orange


People also ask

How do I get my NumPy array size back?

len() is the Python built-in function that returns the number of elements in a list or the number of characters in a string. For numpy. ndarray , len() returns the size of the first dimension.

Why does NumPy take less space?

1. NumPy uses much less memory to store data. The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

What does Newaxis do in NumPy?

Simply put, numpy. newaxis is used to increase the dimension of the existing array by one more dimension, when used once. Thus, 1D array will become 2D array.

What is a masked NumPy array?

A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.


2 Answers

Checkout numpy.where

http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

To keep the same dimensionality you are going to need a fill value. In the example below I use 0, but you could also use np.nan

np.where(arr>3, arr, 0) 

returns

array([[[[ 0, 11],          [ 0, 22],          [ 0, 33]],          [[ 4, 44],          [ 5, 55],          [ 6, 66]]],          [[[ 7, 77],          [ 8, 88],          [ 9, 99]],          [[ 0, 32],          [ 0, 33],          [ 0, 34]]]]) 
like image 158
Alex Avatar answered Sep 30 '22 16:09

Alex


You might consider using an np.ma.masked_array to represent the subset of elements that satisfy your condition:

import numpy as np  arr = np.asarray([[[[1, 11], [2, 22], [3, 33]],                    [[4, 44], [5, 55], [6, 66]]],                   [[[7, 77], [8, 88], [9, 99]],                    [[0, 32], [1, 33], [2, 34]]]])  masked_arr = np.ma.masked_less(arr, 3)  print(masked_arr) # [[[[-- 11] #    [-- 22] #    [3 33]]  #   [[4 44] #    [5 55] #    [6 66]]]   #  [[[7 77] #    [8 88] #    [9 99]]  #   [[-- 32] #    [-- 33] #    [-- 34]]]] 

As you can see, the masked array retains its original dimensions. You can access the underlying data and the mask via the .data and .mask attributes respectively. Most numpy functions will not take into account masked values, e.g.:

# mean of whole array print(arr.mean()) # 26.75  # mean of non-masked elements only print(masked_arr.mean()) # 33.4736842105 

The result of an element-wise operation on a masked array and a non-masked array will also preserve the values of the mask:

masked_arrsum = masked_arr + np.random.randn(*arr.shape)  print(masked_arrsum) # [[[[-- 11.359989067421582] #    [-- 23.249092437269162] #    [3.326111354088174 32.679132708120726]]  #   [[4.289134334263137 43.38559221094378] #    [6.028063054523145 53.5043991898567] #    [7.44695154979811 65.56890530368757]]]   #  [[[8.45692625294376 77.36860675985407] #    [5.915835159196378 87.28574554110307] #    [8.251106168209688 98.7621940026713]]  #   [[-- 33.24398289945855] #    [-- 33.411941757624284] #    [-- 34.964817895873715]]]] 

The sum is only computed over the non-masked values of masked_arr - you can see this by looking at masked_sum.data:

print(masked_sum.data) # [[[[  1.          11.35998907] #    [  2.          23.24909244] #    [  3.32611135  32.67913271]]  #   [[  4.28913433  43.38559221] #    [  6.02806305  53.50439919] #    [  7.44695155  65.5689053 ]]]   #  [[[  8.45692625  77.36860676] #    [  5.91583516  87.28574554] #    [  8.25110617  98.762194  ]]  #   [[  0.          33.2439829 ] #    [  1.          33.41194176] #    [  2.          34.9648179 ]]]] 
like image 39
ali_m Avatar answered Sep 30 '22 16:09

ali_m