A simple question, but I'm not so great with MATLAB. I have vectors x
, (n x 1) y
, (m x 1) and w = [x;y]
. I want to define M
(n+m x 1) as M(i) = number of elements of x that are less than or equal to w(i) (w is sorted). This just isn't cutting it:
N = n + m;
M = zeros(N,1);
for i = 1:N
for j = 1:n
if x(j) <= w(i)
M(i) = M(i) + 1;
end
end
end
It's not a particularly smart way to do it, and some of my data vectors m and n are around 100000.
Thanks!
This may look cryptic, but it should give you the same result as your nested loops:
M = histc(-x(:)',[-fliplr(w(:)') inf]);
M = cumsum(fliplr(M(1:N)))';
The above assumes w
has been sorted in ascending order.
Your sorted vector w
can be thought of as bin edges to use in creating a histogram with the HISTC function. Once you count the number of values that fall in each bin (i.e. between the edges), a cumulative sum over those bins using the CUMSUM function will give you your vector M
. The reason the above code looks so messy (with negations and the function FLIPLR in it) is because you want to find values in x
less than or equal to each value in w
, but the function HISTC bins data in the following way:
n(k)
counts the valuex(i)
ifedges(k) <= x(i) < edges(k+1)
.
Notice that less than is used for the upper limit of each bin. You would want to flip the behavior so that you bin according to the rule edges(k) < x(i) <= edges(k+1)
, which can be achieved by negating the values to be binned, negating the edges, flipping the edges (since the edge input to HISTC must be monotonically nondecreasing), and then flipping the bin counts returned. The value inf
is used as an edge value to count everything less than the lowest value in w
in the first bin.
If you wanted to find values in x
that are simply less than each value in w
, the code would be much simpler:
M = histc(x(:)',[-inf w(:)']);
M = cumsum(M(1:N))';
At a minimum the inner loop can be replaced with:
M(i)=sum(x<=w(i))
this will provide substantial performance improvement. You might then consider using arrayfun:
M = arrayfun(@(wi)( sum( x <= wi ) ), w);
arrayfun is less likely to provide substantial gains over the outer for loop but might be worth a try.
edit: I should note that neither w
or x
need to be sorted for this operation to work correctly.
edit: fwiw, I decided to do some actual performance testing, so I ran this program:
n = 100000; m = n;
N = n + m;
x = rand(n, 1);
w = [x; rand(m, 1)];
tic;
M = zeros(N,1);
for i = 1:N
for j = 1:n
if x(j) <= w(i)
M(i) = M(i) + 1;
end
end
end
perf = toc;
fprintf(1, 'Original : %4.3f sec\n', perf);
w = sort(w); % presorted, so don't incur time cost;
tic;
M = histc(-x(:)',[-fliplr(w(:)') inf]);
M = cumsum(fliplr(M(1:N)))';
perf = toc;
fprintf(1, 'gnovice : %4.3f sec\n', perf);
tic;
M = zeros(N,1);
for i = 1:N
M(i)=sum(x<=w(i));
end
perf = toc;
fprintf(1, 'mine/loop : %4.3f sec\n', perf);
tic;
M = arrayfun(@(wi)( sum( x <= wi ) ), w);
perf = toc;
fprintf(1, 'mine/arrayfun : %4.3f sec\n', perf);
and got these results for n = 1000:
Original : 0.086 sec
gnovice : 0.002 sec
mine/loop : 0.042 sec
mine/arrayfun : 0.070 sec
and for n = 100000:
Original : too long to tell ( >> 1m )
gnovice : 0.050 sec
mine/loop : too long to tell ( >> 1m )
mine/arrayfun : too long to tell ( >> 1m )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With