Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Counting identical rows in an array (without any imports)

For example, given:

import numpy as np
data = np.array(
    [[0, 0, 0],
    [0, 1, 1],
    [1, 0, 1],
    [1, 0, 1],
    [0, 1, 1],
    [0, 0, 0]])

I want to get a 3-dimensional array, looking like:

result = array([[[ 2.,  0.],
                 [ 0.,  2.]],

                [[ 0.,  2.],
                 [ 0.,  0.]]])

One way is:

for row in data
    newArray[ row[0] ][ row[1] ][ row[2] ] += 1

What I'm trying to do is the following:

for i in dimension1
   for j in dimension2
      for k in dimension3
          result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()

This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).

Thanks.

like image 326
mihalios Avatar asked Feb 06 '14 18:02

mihalios


2 Answers

You can also use numpy.histogramdd for this:

>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2.,  0.],
        [ 0.,  2.]],

       [[ 0.,  2.],
        [ 0.,  0.]]])
like image 126
Ashwini Chaudhary Avatar answered Nov 24 '22 02:11

Ashwini Chaudhary


The problem is that data[data[data[:,0]==i, 1]==j, 2]==k is not what you expect it to be.

Let's take this apart for the case (i,j,k) == (0,0,0)

data[:,0]==0 is [True, True, False, False, True, True], and data[data[:,0]==0] correctly gives us the lines where the first number is 0.

Now from those lines we get the lines where the second number is 0: data[data[:,0]==0, 1]==0, which gives us [True, False, False, True]. And this is the problem. Because if we take those indices from data, i.e., data[data[data[:,0]==0, 1]==0] we do not get the rows where the first and second number are 0, but the 0th and 3rd row instead:

In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
                [1, 0, 1]])

And if we now filter for the rows where the third number is 0, we get the wrong result w.r.t. the orignal data.

And that's why your approach does not work. For better methods, see the other answers.

like image 43
tobias_k Avatar answered Nov 24 '22 00:11

tobias_k