Versions of this question have already been asked but I have not found a satisfactory answer.
Problem: given a large numpy vector, find indices of the vector elements which are duplicated (a variation of that could be comparison with tolerance).
So the problem is ~O(N^2) and memory bound (at least from the current algorithm point of view). I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = []
print("init done")
counter = 0
for i in range(N):
for j in range(i+1, N):
if vect[i] == vect[j]:
dupl.append(j)
counter += 1
print("counter =", counter)
print(dupl)
# For simplicity, this code ignores repeated indices
# which can be trimmed later. Ref output is
# counter = 3
# [2500, 5000, 5000]
I tried using numpy iterators but they are even worse (~ x4-5) http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Using N=10,000 I'm getting 0.1 sec in C, 12 sec in Python (code above), 40 sec in Python using np.nditer, 50 sec in Python using np.ndindex. I pushed it to N=160,000 and the timing scales as N^2 as expected.
Since the answers have stopped coming and none was totally satisfactory, for the record I post my own solution.
It is my understanding that it's the assignment which makes Python slow in this case, not the nested loops as I thought initially. Using a library or compiled code eliminates the need for assignments and performance improves dramatically.
from __future__ import print_function
import numpy as np
from numba import jit
N = 10000
vect = np.arange(N, dtype=np.float32)
vect[N/2] = 1
vect[N/4] = 1
dupl = np.zeros(N, dtype=np.int32)
print("init done")
# uncomment to enable compiled function
#@jit
def duplicates(i, counter, dupl, vect):
eps = 0.01
ns = len(vect)
for j in range(i+1, ns):
# replace if to use approx comparison
#if abs(vect[i] - vect[j]) < eps:
if vect[i] == vect[j]:
dupl[counter] = j
counter += 1
return counter
counter = 0
for i in xrange(N):
counter = duplicates(i, counter, dupl, vect)
print("counter =", counter)
print(dupl[0:counter])
Tests
# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 10.135 s
# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 0.480 s
The performance of compiled version (with @jit uncommented) is close to C code performance ~0.1 - 0.2 sec. Perhaps eliminating the last loop could improve the performance even further. The difference in performance is even stronger when using approximate comparison using eps while there is very little difference for the compiled version.
# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 109.218 s
# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 0.506 s
This is ~ 200x difference. In the real code, I had to put both loops in the function as well as use a function template with variable types so it was a bit more complex but not very much.
Python itself is a highly-dynamic, slow, language. The idea in numpy is to use vectorization, and avoid explicit loops. In this case, you can use np.equal.outer
. You can start with
a = np.equal.outer(vect, vect)
Now, for example, to find the sum:
>>> np.sum(a)
10006
To find the indices of i that are equal, you can do
np.fill_diagonal(a, 0)
>>> np.nonzero(np.any(a, axis=0))[0]
array([ 1, 2500, 5000])
Timing
def find_vec():
a = np.equal.outer(vect, vect)
s = np.sum(a)
np.fill_diagonal(a, 0)
return np.sum(a), np.nonzero(np.any(a, axis=0))[0]
>>> %timeit find_vec()
1 loops, best of 3: 214 ms per loop
def find_loop():
dupl = []
counter = 0
for i in range(N):
for j in range(i+1, N):
if vect[i] == vect[j]:
dupl.append(j)
counter += 1
return dupl
>>> % timeit find_loop()
1 loops, best of 3: 8.51 s per loop
This solution using the numpy_indexed package has complexity n Log n, and is fully vectorized; so not terribly different from C performance, in all likelihood.
import numpy_indexed as npi
dpl = np.flatnonzero(npi.multiplicity(vect) > 1)
The obvious question is why you want to do this in this way. NumPy arrays are intended to be opaque data structures – by this I mean NumPy arrays are intended to be created inside the NumPy system and then operations sent in to the NumPy subsystem to deliver a result. i.e. NumPy should be a black box into which you throw requests and out come results.
So given the code above I am not at all suprised that NumPy performance is worse than dreadful.
The following should be effectively what you want, I believe, but done the NumPy way:
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
print([np.where(a == vect)[0] for a in vect][1])
# Delivers [1, 2500, 5000]
Approach #1
You can simulate that iterator dependency criteria for a vectorized solution using a triangular matrix
. This is based on this post
that dealt with multiplication involving iterator dependency
. For performing the elementwise equality of each element in vect
against its all elements, we can use NumPy broadcasting
. Finally, we can use np.count_nonzero
to get the count, as it's supposed to be very efficient in summing purposes on boolean arrays.
So, we would have a solution like so -
mask = np.triu(vect[:,None] == vect,1)
counter = np.count_nonzero(mask)
dupl = np.where(mask)[1]
If you only care about the count counter
, we could have two more approaches as listed next.
Approach #2
We can avoid the use of the triangular matrix and simply get the entire count and just subtract the contribution from diagonal elements and consider just one of either lower of upper triangular regions by just halving the remaining count as the contributions from either ones would be identical.
So, we would have a modified solution like so -
counter = (np.count_nonzero(vect[:,None] == vect) - vect.size)//2
Approach #3
Here's an entirely different approach that uses the fact the count of each unique element plays a cumsumed contribution to the final total.
So, with that idea in mind, we would have a third approach like so -
count = np.bincount(vect) # OR np.unique(vect,return_counts=True)[1]
idx = count[count>1]
id_arr = np.ones(idx.sum(),dtype=int)
id_arr[0] = 0
id_arr[idx[:-1].cumsum()] = -idx[:-1]+1
counter = np.sum(id_arr.cumsum())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With