This question stems from looking at the answers provided to this question regarding counting the number of zero crossings. Several answer were provided that solve the problem, but the NumPy appproach destroyed the others with respect to time.
When I compared four of the answers however I notice that the NumPy solution provides a different result for large sequences. The four answers in question are loop and simple generator, better generator expression , and NumPy solution.
Question: Why does the NumPy solution provide a different result than the other three? (and which is correct?)
Here are the results for counting the number of zero crossings:
Blazing fast NumPy solution
total time: 0.303605794907 sec
Zero Crossings Small: 8
Zero Crossings Med: 54464
Zero Crossings Big: 5449071
Loop solution
total time: 15.6818780899 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
Simple generator expression solution
total time: 16.3374049664 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
Modified generator expression solution
total time: 13.6596589088 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
And the code used to get the results:
import time
import numpy as np
def zero_crossings_loop(sequence):
s = 0
for ind, _ in enumerate(sequence):
if ind+1 < len(sequence):
if sequence[ind]*sequence[ind+1] < 0:
s += 1
return s
def print_three_results(r1, r2, r3):
print 'Zero Crossings Small:', r1
print 'Zero Crossings Med:', r2
print 'Zero Crossings Big:', r3
print '\n'
small = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)
print 'Blazing fast NumPy solution'
tic = time.time()
z1 = (np.diff(np.sign(small)) != 0).sum()
z2 = (np.diff(np.sign(med)) != 0).sum()
z3 = (np.diff(np.sign(big)) != 0).sum()
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Loop solution'
tic = time.time()
z1 = zero_crossings_loop(small)
z2 = zero_crossings_loop(med)
z3 = zero_crossings_loop(big)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Simple generator expression solution'
tic = time.time()
z1 = sum(1 for i, _ in enumerate(small) if (i+1 < len(small)) if small[i]*small[i+1] < 0)
z2 = sum(1 for i, _ in enumerate(med) if (i+1 < len(med)) if med[i]*med[i+1] < 0)
z3 = sum(1 for i, _ in enumerate(big) if (i+1 < len(big)) if big[i]*big[i+1] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Modified generator expression solution'
tic = time.time()
z1 = sum(1 for i in xrange(1, len(small)) if small[i-1]*small[i] < 0)
z2 = sum(1 for i in xrange(1, len(med)) if med[i-1]*med[i] < 0)
z3 = sum(1 for i in xrange(1, len(big)) if big[i-1]*big[i] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
Your solutions differ in their treatment of zero. The numpy.diff solution will still return a diff going from -1 to 0 or 1 to 0, counting those as a zero crossing, while your iterative solutions don't because they use the product being less than zero as their criterion. Instead, test for <= 0
, and the numbers will be equivalent.
I get the same results as the loop with:
((array[:-1] * array[1:]) < 0).sum()
This:
small = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3,
-95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)
for name, array in [('small', small), ('med', med), ('big', big)]:
print('loop ', name, zero_crossings_loop(array))
print('Numpy', name, ((array[:-1] * array[1:]) < 0).sum())
prints:
loop small 8
Numpy small 8
loop med 44901
Numpy med 44901
loop big 4496911
Numpy big 4496911
UDPATE
This version avoids the problem with zeros:
def numpy_zero_crossings2(array):
nonzero_array = array[np.nonzero(array)]
return ((nonzero_array[:-1] * nonzero_array[1:]) < 0).sum()
It gives the same result as the answer by @djsutton:
>>> numpy_zero_crossings2(big) == numpy_zero_crossings(big)
True
but seesm a bit faster:
%timeit numpy_zero_crossings2(big)
1 loops, best of 3: 194 ms per loop
vs:
%timeit numpy_zero_crossings(big)
1 loops, best of 3: 227 ms per loop
Both the iterative and numpy solutions do not do well at counting crossings when a data element is equal to zero. For the data [1,0,-1] the iterative solution gives 0 crossings and the numpy solution gives 2 crossings, neither of which seems correct.
One solution would be to drop data elements equal to zero. In NumPy you might try something like
def numpy_zero_crossings(data):
return (np.diff(np.sign(data)[np.nonzero(data)]) != 0).sum()
However, this introduces another iteration through the array, so it will increase run time by another O(n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With