Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding (multiset) difference between two arrays

Given arrays (say row vectors) A and B, how do I find an array C such that merging B and C will give A?

For example, given

A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];

then

C = multiset_diff(A, B) % Should be [4, 6, 4, 3, 1, 5]

(the order of the result does not matter here).

For the same A, if B = [2, 4, 5], then the result should be [6, 4, 3, 3, 1, 5, 5].

(Since there were two 4s in A and one 4 in B, the result C should have 2 - 1 = 1 4 in it. Similarly for the other values.)

PS: Note that setdiff would remove all instances of 2, 3, and 5, whereas here they need to be removed just however many times they appear in B.


Performance: I ran some quick-n-dirty benchmarks locally, here are the results for future reference:

  • @heigele's nested loop method performs best for small lengths of A (say upto N = 50 or so elements). It does 3x better for small (N=20) As, and 1.5x better for medium-sized (N=50) As, compared to the next best method - which is:

  • @obchardon's histc-based method. This is the one performs the best when A's size N starts to be 100 and above. For eg., this does 3x better than the above nested loop method when N = 200.

@matt's for+find method does comparably to the histc method for small N, but quickly degrades in performance for larger N (which makes sense since the entire C == B(x) comparison is run every iteration).

(The other methods are either several times slower or invalid at the time of writing.)

like image 483
Sundar R Avatar asked Aug 13 '18 19:08

Sundar R


1 Answers

Here's a vectorized way. Memory-inefficient, mostly for fun:

tA = sum(triu(bsxfun(@eq, A, A.')), 1);
tB = sum(triu(bsxfun(@eq, B, B.')), 1);
result = setdiff([A; tA].', [B; tB].', 'rows', 'stable');
result = result(:,1).';

The idea is to make each entry unique by tagging it with an occurrence number. The vectors become 2-column matrices, setdiff is applied with the 'rows' option, and then the tags are removed from the result.

like image 142
Luis Mendo Avatar answered Oct 06 '22 03:10

Luis Mendo