I'm trying to implement a specific binary search algorithm. "Results" should be an empty set in the beginning, and during the search, Results variable will become a union with the new results that we get.
Basically:
results = set()
for result in search():
results = results.union(result)
But such code won't really work with Numpy arrays, so we use np.union1d
for this purpose:
results = np.array([])
for result in search():
result = np.union1d(results, result)
The code above doesn't really work either, since if we have for example two vectors a = [1,2,3]
and b=[3,4,5]
, np.union1d(a, b)
will return:
[1, 2, 3, 4, 5]
But I want it to return:
[[1, 2, 3], [3,4,5]]
Since there are no duplicate vectors, if we had for example union([[1, 2, 3], [3,4,5]], [1,2,3])
, return value shall remain:
[[1, 2, 3], [3,4,5]]
So I would say that I require a numpy array based union.
I also considered using np.append(a, b)
and then np.unique(x)
, but both of the functions project lower dimensional array to higher dimensional one. np.append
also has axis=0
property, which retains dimension of all arrays inserted, but I couldn't efficiently implement it without getting dimension error.
How can I efficiently implement a vector based set? So that points in the union will be considered as vectors instead of scalars, and will retain their vector form and dimension.
Here's some basic set operations.
Define a pair of lists (they could be np.array([1,2,3])
, but that's not what you show.
In [261]: a = [1,2,3]; b=[3,4,5]
A list of several of those:
In [263]: alist = [a, b, a]
In [264]: alist
Out[264]: [[1, 2, 3], [3, 4, 5], [1, 2, 3]]
I can get the unique values by converting to tuples and putting them in a set
.
In [265]: set([tuple(i) for i in alist])
Out[265]: {(1, 2, 3), (3, 4, 5)}
I can also convert that list into a 2d array:
In [266]: arr = np.array(alist)
In [267]: arr
Out[267]:
array([[1, 2, 3],
[3, 4, 5],
[1, 2, 3]])
and get the unique rows with unique
and an axis parameter:
In [269]: np.unique(arr, axis=0)
Out[269]:
array([[1, 2, 3],
[3, 4, 5]])
Compare the times
In [270]: timeit np.unique(arr, axis=0)
46.5 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [271]: timeit set([tuple(i) for i in alist])
1.01 µs ± 1.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Converting array to list or list to array adds some time, but the basic pattern remains.
In [272]: timeit set([tuple(i) for i in arr.tolist()])
1.53 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [273]: timeit np.unique(alist, axis=0)
53.3 µs ± 90.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For larger, realistic sources relative timings may change a bit, but I expect that the set of tuples will be remain the best. Set operations are not a numpy
strong point. unique
does a sort, followed by a elimination of duplicates. set
uses a hashing method, similar to what Python uses for dictionaries.
If you must collect values iteratively from a source
, I'd suggest building a list, and doing the set/unique
once.
alist = []
for x in source():
alist.append(x)
or one of:
alist = [x for x in source()]
alist = list(source())
alist = [tuple(x) for x in source()]
alist = [tuple(x.tolist()) for x in source()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With