Given arrays (say row vectors) A and B, how do I find an array C such that merging B and C will give A?
For example, given
A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];
then
C = multiset_diff(A, B) % Should be [4, 6, 4, 3, 1, 5]
(the order of the result does not matter here).
For the same A, if B = [2, 4, 5]
, then the result should be [6, 4, 3, 3, 1, 5, 5]
.
(Since there were two 4
s in A and one 4
in B, the result C should have 2 - 1 = 1 4
in it. Similarly for the other values.)
PS: Note that setdiff
would remove all instances of 2, 3, and 5, whereas here they need to be removed just however many times they appear in B.
Performance: I ran some quick-n-dirty benchmarks locally, here are the results for future reference:
@heigele's nested loop method performs best for small lengths of A (say upto N = 50 or so elements). It does 3x better for small (N=20) A
s, and 1.5x better for medium-sized (N=50) A
s, compared to the next best method - which is:
@obchardon's histc
-based method. This is the one performs the best when A's size N starts to be 100 and above. For eg., this does 3x better than the above nested loop method when N = 200.
@matt's for+find method does comparably to the histc method for small N, but quickly degrades in performance for larger N (which makes sense since the entire C == B(x)
comparison is run every iteration).
(The other methods are either several times slower or invalid at the time of writing.)
Here's a vectorized way. Memory-inefficient, mostly for fun:
tA = sum(triu(bsxfun(@eq, A, A.')), 1);
tB = sum(triu(bsxfun(@eq, B, B.')), 1);
result = setdiff([A; tA].', [B; tB].', 'rows', 'stable');
result = result(:,1).';
The idea is to make each entry unique by tagging it with an occurrence number. The vectors become 2-column matrices, setdiff
is applied with the 'rows'
option, and then the tags are removed from the result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With