Different results to counting zero-crossings of a large sequence

Question

This question stems from looking at the answers provided to this question regarding counting the number of zero crossings. Several answer were provided that solve the problem, but the NumPy appproach destroyed the others with respect to time.

When I compared four of the answers however I notice that the NumPy solution provides a different result for large sequences. The four answers in question are loop and simple generator, better generator expression , and NumPy solution.

Question: Why does the NumPy solution provide a different result than the other three? (and which is correct?)

Here are the results for counting the number of zero crossings:

Blazing fast NumPy solution
total time: 0.303605794907 sec
Zero Crossings Small: 8
Zero Crossings Med: 54464
Zero Crossings Big: 5449071

Loop solution
total time: 15.6818780899 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Simple generator expression solution
total time: 16.3374049664 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Modified generator expression solution
total time: 13.6596589088 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

And the code used to get the results:

import time
import numpy as np

def zero_crossings_loop(sequence):
    s = 0
    for ind, _ in enumerate(sequence):
        if ind+1 < len(sequence):
            if sequence[ind]*sequence[ind+1] < 0:
                s += 1
    return s

def print_three_results(r1, r2, r3):
    print 'Zero Crossings Small:', r1
    print 'Zero Crossings Med:', r2
    print 'Zero Crossings Big:', r3
    print '
'

small = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

print 'Blazing fast NumPy solution'
tic = time.time()
z1 = (np.diff(np.sign(small)) != 0).sum()
z2 = (np.diff(np.sign(med)) != 0).sum()
z3 = (np.diff(np.sign(big)) != 0).sum()
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Loop solution'
tic = time.time()
z1 = zero_crossings_loop(small)
z2 = zero_crossings_loop(med)
z3 = zero_crossings_loop(big)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Simple generator expression solution'
tic = time.time()
z1 = sum(1 for i, _ in enumerate(small) if (i+1 < len(small)) if small[i]*small[i+1] < 0)
z2 = sum(1 for i, _ in enumerate(med) if (i+1 < len(med)) if med[i]*med[i+1] < 0)
z3 = sum(1 for i, _ in enumerate(big) if (i+1 < len(big)) if big[i]*big[i+1] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Modified generator expression solution'
tic = time.time()
z1 = sum(1 for i in xrange(1, len(small)) if small[i-1]*small[i] < 0)
z2 = sum(1 for i in xrange(1, len(med)) if med[i-1]*med[i] < 0)
z3 = sum(1 for i in xrange(1, len(big)) if big[i-1]*big[i] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

jwilner · Accepted Answer

Your solutions differ in their treatment of zero. The numpy.diff solution will still return a diff going from -1 to 0 or 1 to 0, counting those as a zero crossing, while your iterative solutions don't because they use the product being less than zero as their criterion. Instead, test for <= 0, and the numbers will be equivalent.

Mike Müller · Answer

I get the same results as the loop with:

((array[:-1] * array[1:]) < 0).sum()

This:

small = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3,
                 -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

for name, array in [('small', small), ('med', med), ('big', big)]:
    print('loop ', name, zero_crossings_loop(array))
    print('Numpy', name, ((array[:-1] * array[1:]) < 0).sum())

prints:

loop  small 8
Numpy small 8
loop  med 44901
Numpy med 44901
loop  big 4496911
Numpy big 4496911

UDPATE

This version avoids the problem with zeros:

def numpy_zero_crossings2(array):
    nonzero_array = array[np.nonzero(array)]
    return ((nonzero_array[:-1] * nonzero_array[1:]) < 0).sum()

It gives the same result as the answer by @djsutton:

>>> numpy_zero_crossings2(big) == numpy_zero_crossings(big)     
True

but seesm a bit faster:

%timeit numpy_zero_crossings2(big)
1 loops, best of 3: 194 ms per loop

vs:

%timeit numpy_zero_crossings(big)
1 loops, best of 3: 227 ms per loop

djsutton · Answer

Both the iterative and numpy solutions do not do well at counting crossings when a data element is equal to zero. For the data [1,0,-1] the iterative solution gives 0 crossings and the numpy solution gives 2 crossings, neither of which seems correct.

One solution would be to drop data elements equal to zero. In NumPy you might try something like

def numpy_zero_crossings(data):
    return (np.diff(np.sign(data)[np.nonzero(data)]) != 0).sum()

However, this introduces another iteration through the array, so it will increase run time by another O(n)

Different results to counting zero-crossings of a large sequence

Tags:

python

numpy

Scott

3 Answers

jwilner

Mike Müller

djsutton

Recent Activity

Donate For Us

Different results to counting zero-crossings of a large sequence

Tags:

python

numpy

Scott

3 Answers

jwilner

Mike Müller

djsutton

Related questions

Recent Activity

Donate For Us