Efficient way to get union of set of vectors in Numpy

Question

I'm trying to implement a specific binary search algorithm. "Results" should be an empty set in the beginning, and during the search, Results variable will become a union with the new results that we get.

Basically:

results = set()
for result in search():
  results = results.union(result)

But such code won't really work with Numpy arrays, so we use np.union1d for this purpose:

results = np.array([])
for result in search():
    result = np.union1d(results, result)

The code above doesn't really work either, since if we have for example two vectors a = [1,2,3] and b=[3,4,5], np.union1d(a, b) will return:

[1, 2, 3, 4, 5]

But I want it to return:

[[1, 2, 3], [3,4,5]]

Since there are no duplicate vectors, if we had for example union([[1, 2, 3], [3,4,5]], [1,2,3]), return value shall remain:

[[1, 2, 3], [3,4,5]]

So I would say that I require a numpy array based union.

I also considered using np.append(a, b) and then np.unique(x), but both of the functions project lower dimensional array to higher dimensional one. np.append also has axis=0 property, which retains dimension of all arrays inserted, but I couldn't efficiently implement it without getting dimension error.

Question:

How can I efficiently implement a vector based set? So that points in the union will be considered as vectors instead of scalars, and will retain their vector form and dimension.

hpaulj · Accepted Answer

Here's some basic set operations.

Define a pair of lists (they could be np.array([1,2,3]), but that's not what you show.

In [261]: a = [1,2,3]; b=[3,4,5]

A list of several of those:

In [263]: alist = [a, b, a]
In [264]: alist
Out[264]: [[1, 2, 3], [3, 4, 5], [1, 2, 3]]

I can get the unique values by converting to tuples and putting them in a set.

In [265]: set([tuple(i) for i in alist])
Out[265]: {(1, 2, 3), (3, 4, 5)}

I can also convert that list into a 2d array:

In [266]: arr = np.array(alist)
In [267]: arr
Out[267]: 
array([[1, 2, 3],
       [3, 4, 5],
       [1, 2, 3]])

and get the unique rows with unique and an axis parameter:

In [269]: np.unique(arr, axis=0)
Out[269]: 
array([[1, 2, 3],
       [3, 4, 5]])

Compare the times

In [270]: timeit np.unique(arr, axis=0)
46.5 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [271]: timeit set([tuple(i) for i in alist])
1.01 µs ± 1.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Converting array to list or list to array adds some time, but the basic pattern remains.

In [272]: timeit set([tuple(i) for i in arr.tolist()])
1.53 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [273]: timeit np.unique(alist, axis=0)
53.3 µs ± 90.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For larger, realistic sources relative timings may change a bit, but I expect that the set of tuples will be remain the best. Set operations are not a numpy strong point. unique does a sort, followed by a elimination of duplicates. set uses a hashing method, similar to what Python uses for dictionaries.

If you must collect values iteratively from a source, I'd suggest building a list, and doing the set/unique once.

alist = []
for x in source():
    alist.append(x)

or one of:

alist = [x for x in source()]
alist = list(source())
alist = [tuple(x) for x in source()]
alist = [tuple(x.tolist()) for x in source()]

Efficient way to get union of set of vectors in Numpy

Tags:

python

arrays

numpy

Question:

ShellRox

1 Answers

hpaulj

Recent Activity

Donate For Us

Efficient way to get union of set of vectors in Numpy

Tags:

python

arrays

numpy

Question:

ShellRox

1 Answers

hpaulj

Related questions

Recent Activity

Donate For Us