For example, given:
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]])
I want to get a 3-dimensional array, looking like:
result = array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
One way is:
for row in data
newArray[ row[0] ][ row[1] ][ row[2] ] += 1
What I'm trying to do is the following:
for i in dimension1
for j in dimension2
for k in dimension3
result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()
This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).
Thanks.
You can also use numpy.histogramdd
for this:
>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
The problem is that data[data[data[:,0]==i, 1]==j, 2]==k
is not what you expect it to be.
Let's take this apart for the case (i,j,k) == (0,0,0)
data[:,0]==0
is [True, True, False, False, True, True]
, and data[data[:,0]==0]
correctly gives us the lines where the first number is 0
.
Now from those lines we get the lines where the second number is 0
: data[data[:,0]==0, 1]==0
, which gives us [True, False, False, True]
. And this is the problem. Because if we take those indices from data
, i.e., data[data[data[:,0]==0, 1]==0]
we do not get the rows where the first and second number are 0
, but the 0th
and 3rd
row instead:
In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
[1, 0, 1]])
And if we now filter for the rows where the third number is 0
, we get the wrong result w.r.t. the orignal data.
And that's why your approach does not work. For better methods, see the other answers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With