Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

count number of items in np.where() array python

I am currently attempting to compare two columns in a pandas DataFrame:

--------------- Cluster Assignment ---------------
           ID      Class   Cluster
    0   1000025      2        4
    1   1002945      2        2
    2   1015425      2        4
    3   1016277      2        2
    4   1017023      2        4
    5   1017122      4        2
    6   1018099      2        4
    7   1018561      2        4
    8   1033078      2        4
    9   1033078      2        4
    10  1035283      2        4
    11  1036172      2        4
    12  1041801      4        4
    13  1043999      2        4
    14  1044572      4        2
    15  1047630      4        4
    16  1048672      2        4
    17  1049815      2        4
    18  1050670      4        2
    19  1050718      2        4

in an attempt to find the number of rows that don't match to find the ratio of errors in my dataframe (the full df is much longer than this). I'm using np.where() to make the comparison, and I'm getting an accurate output of all the rows that are incorrect, but now I want to add how many rows are wrong, then divide that by the total number of rows.. my problem now is that I'm getting:

>>> data= np.where(df7['Class']!=df7['Cluster'])
>>> print(len(data))
1

if I print the type for dataI get < class 'tuple' >. So, I tried converting from tuple to list using:

>>> print(list(data))
[array([  9,  11,  17,  31,  32,  33,  34,  36,  38,  62,  64,  65, 135,
   156, 196, 201, 277, 301], dtype=int64)]

Obviously, this isn't helpful because if I try to print/store the length of that list, I get

>>> print(list(data))
[array([  9,  29,  30,  31,  33,  35,  59,  61,  62, 132, 153, 193, 198,
   274, 298], dtype=int64)]
>>> print('errors: ', len(cluster2wrong))
errors:  1

Could someone point me in the direction of how I can just count these items?

like image 679
Nick Bohl Avatar asked Oct 31 '25 05:10

Nick Bohl


1 Answers

The result of np.where is a tuple containing n arrays, where n is the number of dimensions in your array. The good new is that each of these n arrays has the same length (each representing one "index" for every found item), so you could just use the length of any of them:

>>> len(data[0])  # or len(data[i]) where i < dimensions of your df7

as already mentioned in the comments. However if you just want to know how many items satisfy the condition, you can use np.count_nonzero:

>>> a = np.array([2,3,4,5])
>>> b = np.array([3,3,3,3])

>>> np.count_nonzero(a != b)
3
like image 96
MSeifert Avatar answered Nov 01 '25 20:11

MSeifert



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!