Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One line solution for editing a numpy array of counts? (python)

I want to make a numpy array that contains how many times a value (between 1-3) occurs at a specific location. For example, if I have:

a = np.array([[1,2,3], 
              [3,2,1], 
              [2,1,3], 
              [1,1,1]])

I want to get back an array like so:

[[[ 1  0  0]
  [ 0  1  0]
  [ 0  0  1]]

 [[ 0  0  1]
  [ 0  1  0]
  [ 1  0  0]]

 [[ 0  1  0]
  [ 1  0  0]
  [ 0  0  1]]

 [[ 1  0  0]
  [ 1  0  0]
  [ 1  0  0]]]

Where the array tells me that 1 occurs once in the first position, 2 occurs once in the second position, 3 occurs once in the third position, 1 occurs once in the fourth position, etc. Later, I'll have more input arrays of the same dimensions, and I would like to add on the totals of the values to this array of counts.

The code I have right now is:

a = np.array([[1,2,3],
              [3,2,1],
              [2,1,3],
              [1,1,1]])

cumulative = np.zeros((4,3,3))

for r in range(len(cumulative)):
    for c in range(len(cumulative[0])):
        cumulative[r, c, a[r,c]-1] +=1

This does give me the output I want. However, I would like to condense the for loops into one line, using a line similar to this:

cumulative[:, :, a[:, :]-1] +=1

This line doesn't work, and I can't find anything online on how to perform this operation. Any suggestions?

like image 273
A. W. Avatar asked Jul 08 '17 21:07

A. W.


2 Answers

IIUC, you could take advantage of broadcasting:

In [93]: ((a[:, None] - 1) == np.arange(3)[:, None]).swapaxes(2, 1).astype(int)
Out[93]: 
array([[[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1]],

       [[0, 0, 1],
        [0, 1, 0],
        [1, 0, 0]],

       [[0, 1, 0],
        [1, 0, 0],
        [0, 0, 1]],

       [[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0]]])
like image 97
DSM Avatar answered Sep 20 '22 10:09

DSM


It's technically not a one-liner, but if you ignore PEP 8's maximum line length then you can whittle it down to two lines.

a = np.array([[1,2,3],
              [3,2,1],
              [2,1,3],
              [1,1,1]])

out = np.zeros((a.shape[0], 1 + a.max() - a.min(), a.shape[1]), dtype=np.int8)
out[np.repeat(np.arange(a.shape[0]), a.shape[1]), np.subtract(
    a, a.min())[:].flatten(), np.tile(np.arange(a.shape[1]), a.shape[0])] = 1
print(out)

Which outputs;

[[[1 0 0]
  [0 1 0]
  [0 0 1]]

 [[0 0 1]
  [0 1 0]
  [1 0 0]]

 [[0 1 0]
  [1 0 0]
  [0 0 1]]

 [[1 0 0]
  [1 0 0]
  [1 0 0]]]

This is perhaps not the most gainly nor graceful solution, and unfortunately does not scale to n dimensions, but hopefully this (almost one-liner) is sufficiently vectorised for you.


It's quite hefty so I'll briefly run through how this works.

  • The output array is created full of zeros by default, with the total lengths of the 'one-hot vectors' equal to the range of the input array (I assumed this is what you wanted given that there was no row for the value zero given in your example).

  • np.tile and np.repeat are used with np.arange to produce the first and last index arrays, that is the indices of each element in a.

  • Fancy indicing is used to fill set the indices of a matching number to 1.

like image 32
Tom Wyllie Avatar answered Sep 21 '22 10:09

Tom Wyllie