I have an array S
, which has some number of unique elements. I want to append elements from array N
which are not already in S
.
A syntactically simple way to do this would be:
S = union( S, N, 'stable' );
I found manually appending could be quicker, using ismember
or implicit expansion:
% ismember approach
S = [S; N(~ismember(N,S))];
% imp. expansion approach
S = [S; N(~any(S(:)==N(:).',1))];
However, this still feels pretty dirty to do inside a loop, and the implicit expansion can be expensive for large inputs.
Is there a more performant alternative?
If it helps, we can assume S
and N
contain only integers. However, we can't assume that S
is sorted, new elements appended from N
could be any positive integer.
Minimal example:
Ntest = [1 2 3 4
2 5 3 6
1 5 7 9];
S = [];
for ii = 1:3
N = Ntest(ii,:).';
S = union(S,N,'stable');
end
% S = [ 1; 2; 3; 4; 5; 6; 7; 9 ]
In the real case, I don't know the potential values of N
up front like I did with Ntest
above.
Here is some benchmarking code for 4 methods, with the following results. In my case, it's likely that I'll have a large loop for different values of N
, and a small number of elements in each N
. This corresponds to the right-most columns in this summary table, where you can see the implicit expansion method is much quicker.
range(Ntest): 1 to 1e4 1 to 1e4 1 to 1e4 1 to 1e4
size(Ntest): [1e3,1e3] [1e4,1e3] [1e2,1e3] [1e2,1e4]
union: 0.972 sec 1.217 sec 0.779 sec 9.341 sec
ismember: 0.763 sec 0.559 sec 0.492 sec 5.439 sec
implicit: 6.966 sec too long! 0.295 sec 3.886 sec
setdiff: 0.599 sec 0.534 sec 0.477 sec 5.364 sec
rng(0);
Ntest = randi([1,1e4],1e3,1e3);
f = @()f_union( Ntest );
fprintf( 'union: \t%.3f sec\n', timeit( f ) );
f = @()f_ismember( Ntest );
fprintf( 'ismember: \t%.3f sec\n', timeit( f ) );
f = @()f_implicit( Ntest );
fprintf( 'implicit: \t%.3f sec\n', timeit( f ) );
f = @()f_setdiff( Ntest );
fprintf( 'setdiff: \t%.3f sec\n', timeit( f ) );
function f_union( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = union(S,N,'stable');
end
end
function f_ismember( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = [S; N(~ismember(N,S))];
end
end
function f_implicit( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = [S; N(~any(S(:)==N(:).',1))];
end
end
function f_setdiff( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = [S;setdiff(N,S)];
end
end
Since it is assumed that the data type is positive integer you can use logical matrix to store position of integers:
function f_logical( Ntest )
S = false;
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S(N) = true;
end
end
If the range of elements is large and the data has sparsity it may be beneficial to use sparse matrix:
function f_sparse( Ntest )
S = sparse(false);
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S(N) = true;
end
end
Comparing with the ismember
solution in Octave:
Elapsed time for <ismember> is 1.54181 seconds.
Elapsed time for <sparse> is 0.266474 seconds.
Elapsed time for <logical> is 0.0189412 seconds.
I guess you can use the following code to speed up
X = setdiff(N,S);
S(end + (1:length(X))) = X;
X = N(~ismember(N,S))
and X = setdiff(N,S)
are both fine to find the elements that in N
but not in S
, but the key step for speeding up the appending process is the following wayS(end + (1:length(X))) = X;
rng(0);
Ntest = randi([1,1e4],1e4,1e4);
f = @()f_union( Ntest );
fprintf( 'union: \t%.3f sec\n', timeit( f ) );
f = @()f_ismember_v1( Ntest );
fprintf( 'ismember_v1: \t%.3f sec\n', timeit( f ) );
f = @()f_ismember_v2( Ntest );
fprintf( 'ismember_v2: \t%.3f sec\n', timeit( f ) );
f = @()f_setdiff_v1( Ntest );
fprintf( 'setdiff_v1: \t%.3f sec\n', timeit( f ) );
f = @()f_setdiff_v2( Ntest );
fprintf( 'setdiff_v2: \t%.3f sec\n', timeit( f ) );
function f_union( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = union(S,N,'stable');
end
end
function f_ismember_v1( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = [S; N(~ismember(N,S))];
end
end
function f_ismember_v2( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
X = N(~ismember(N,S));
S(end + (1:length(X))) = X;
end
end
function f_setdiff_v1( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
S = [S;setdiff(N,S)];
end
end
function f_setdiff_v2( Ntest )
S = [];
for ii = 1:size(Ntest,2)
N = Ntest(:,ii);
X = setdiff(N,S);
S(end + (1:length(X))) = X;
end
end
giving
union: 13.314 sec
ismember_v1: 5.836 sec
ismember_v2: 5.658 sec
setdiff_v1: 4.371 sec
setdiff_v2: 4.248 sec
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With