Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting non-zero elements within each row and within each column of a 2D NumPy array

I have a NumPy matrix that contains mostly non-zero values, but occasionally will contain a zero value. I need to be able to:

  1. Count the non-zero values in each row and put that count into a variable that I can use in subsequent operations, perhaps by iterating through row indices and performing the calculations during the iterative process.

  2. Count the non-zero values in each column and put that count into a variable that I can use in subsequent operations, perhaps by iterating through column indices and performing the calculations during the iterative process.

For example, one thing I need to do is to sum each row and then divide each row sum by the number of non-zero values in each row, reporting a separate result for each row index. And then I need to sum each column and then divide the column sum by the number of non-zero values in the column, also reporting a separate result for each column index. I need to do other things as well, but they should be easy after I figure out how to do the things that I am listing here.

The code I am working with is below. You can see that I am creating an array of zeros and then populating it from a csv file. Some of the rows will contain values for all the columns, but other rows will still have some zeros remaining in some of the last columns, thus creating the problem described above.

The last five lines of the code below are from another posting on this forum. These last five lines of code return a printed list of row/column indices for the zeros. However, I do not know how to use that resulting information to create the non-zero row counts and non-zero column counts described above.

ANOVAInputMatrixValuesArray=zeros([len(TestIDs),9],float) j=0 for j in range(0,len(TestIDs)):     TestID=str(TestIDs[j])     ReadOrWrite='Read'     fileName=inputFileName     directory=GetCurrentDirectory(arguments that return correct directory)     inputfile=open(directory,'r')     reader=csv.reader(inputfile)     m=0     for row in reader:         if m<9:             if row[0]!='TestID':                 ANOVAInputMatrixValuesArray[(j-1),m]=row[2]                 m+=1     inputfile.close()  IndicesOfZeros = indices(ANOVAInputMatrixValuesArray.shape)  locs = IndicesOfZeros[:,ANOVAInputMatrixValuesArray == 0] pts = hsplit(locs, len(locs[0])) for pt in pts:     print(', '.join(str(p[0]) for p in pt)) 

Can anyone help me with this?

like image 381
MedicalMath Avatar asked Sep 26 '10 09:09

MedicalMath


People also ask

How do you count nonzero elements in a NumPy array?

count_nonzero() function counts the number of non-zero values in the array arr. Parameters : arr : [array_like] The array for which to count non-zeros. axis : [int or tuple, optional] Axis or tuple of axes along which to count non-zeros.

How do you count the number of zeros in a NumPy array?

To count all the zeros in an array, simply use the np. count_nonzero() function checking for zeros. It returns the count of elements inside the array satisfying the condition (in this case, if it's zero or not).


1 Answers

import numpy as np  a = np.array([[1, 0, 1],               [2, 3, 4],               [0, 0, 7]])  columns = (a != 0).sum(0) rows    = (a != 0).sum(1) 

The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.

The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.

The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:

columns = np.array([2, 1, 3]) rows    = np.array([2, 3, 1]) 

EDIT: The whole code could look like this (with a few simplifications in your original code):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float) for j, TestID in enumerate(TestIDs):     ReadOrWrite = 'Read'     fileName = inputFileName     directory = GetCurrentDirectory(arguments that return correct directory)     # use directory or filename to get the CSV file?     with open(directory, 'r') as csvfile:         ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]  nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0) nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1) 

EDIT 2:

To get the mean value of all columns/rows, use the following:

colMean = a.sum(0) / (a != 0).sum(0) rowMean = a.sum(1) / (a != 0).sum(1) 

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.

like image 65
eumiro Avatar answered Oct 17 '22 23:10

eumiro