Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing average of non-zero values

I have lists from whose I what the average of non-zero values.

E.G

 [2,2,0,0,0] -> 2    
 [1,1,0,1,0]  -> 1  
 [0,0,0,9,0] -> 9    
 [2,3,0,0,0] -> 2.5

Current I'm doing this:

list_ = [1,1,0,1,0]  
non_zero = [float(v) for v in list_ if v>0]
averge = sum(non_zero)/len(non_zero)

How can I do this operation more efficiently?

like image 659
Luis Ramon Ramirez Rodriguez Avatar asked Jan 12 '17 00:01

Luis Ramon Ramirez Rodriguez


3 Answers

If you start with a numpy array, you can use np.nonzero to filter the array, then take the mean:

a = np.array([2,3,0,0,0])
average = a[np.nonzero(a)].mean()

You could also filter by boolean indexing, which appears to be faster:

average = a[a!=0].mean()

You could also easily change the method above to filter for positive values by using a>0.

Timings

Using the following setup:

a = np.random.randint(100, size=10**6)

I get the following timings:

%timeit a[a!=0].mean()
100 loops, best of 3: 4.59 ms per loop

%timeit a[a.nonzero()].mean()
100 loops, best of 3: 9.82 ms per loop
like image 52
root Avatar answered Sep 23 '22 12:09

root


Here's a vectorized approach with summing after converting list of lists to a 2D array -

from __future__ import division
a = np.asarray(list_)
a.sum(1)/(a!=0).sum(1)

Sample run -

In [32]: list_  #  Input list of lists
Out[32]: [[2, 2, 0, 0, 0], [1, 1, 0, 1, 0], [0, 0, 0, 9, 0], [2, 3, 0, 0, 0]]

In [33]: a = np.asarray(list_) # Convert to array

In [34]: a.sum(1)/(a!=0).sum(1) # Divide row sums by count of non-zeros 
Out[34]: array([ 2. ,  1. ,  9. ,  2.5])
like image 25
Divakar Avatar answered Sep 22 '22 12:09

Divakar


You could use np.nonzero:

l = np.array([2,2,0,0,0])

l[l.nonzero()].mean()
Out[17]: 2.0

A rough benchmark wrapping your current approach and this one in functions:

def luis_way(l):
    non_zero = [float(v) for v in l if v>0]
    average = sum(non_zero)/len(non_zero)
    return average

def np_way(l):
    return l[l.nonzero()].mean()



In [19]: some_l = np.random.randint(2, size=10000)
In [20]: %timeit luis_way(some_l)
100 loops, best of 3: 4.72 ms per loop
In [21]: %timeit np_way(some_l)
1000 loops, best of 3: 262 µs per loop

For small inputs, though, your current approach is probably fine. It is however worth noting that your current answer is not actually taking all non-zero elements, but only positive elements.

like image 20
miradulo Avatar answered Sep 19 '22 12:09

miradulo