I want to split my data
variable into different variables a
b
and c
, and apply mean
to the bins (1st dimension). Is there way to substantially (e.g. 1x order of magnitude) improve this code in terms of speed? General feedback welcome
data=rand(20,1000); %generate data
bins=[5 10 5]; %given size of bins
start_bins=cumsum([1 bins(1:end-1)]);
end_bins=cumsum([bins]);
%split the data into 3 cell arrays and apply mean in 1st dimension
binned_data=cellfun(@(x,y) mean(data(x:y,:),1),num2cell(start_bins),num2cell(end_bins),'uni',0);
%data (explicitly) has be stored into different variables
[a,b,c]=deal(binned_data{:});
whos a b c
Name Size Bytes Class Attributes
a 1x1000 8000 double
b 1x1000 8000 double
c 1x1000 8000 double
You can use splitapply
(accumarray
's slightly friendlier little brother):
% Your example
data = rand(20,1000); % generate data
bins = [5 10 5]; % given size of bins
% Calculation
bins = repelem(1:numel(bins), bins).'; % Bin sizes to group labels
binned_data = splitapply( @mean, data, bins ); % splitapply for calculation
The rows of binned_data
are your a
, b
and c
.
The mean can be applied before the splitting, which reduces the data to a vector, and then accumarray
can be used:
binned_data = accumarray(repelem(1:numel(bins), bins).', mean(data,2), [], @(x){x.'});
accumarray
1 does not work with matrix data. But you can use sparse
, which automatically accumulates data values corresponding to the same indices:
ind_rows = repmat(repelem((1:numel(bins)).', bins), 1, size(data,2));
ind_cols = repmat(1:size(data,2), size(data,1), 1);
binned_data = sparse(ind_rows, ind_cols, data);
binned_data = bsxfun(@rdivide, binned_data, bins(:));
binned_data = num2cell(binned_data, 2).';
But splitapply
does. See @Wolfie's answer.
You can use matrix multiplication:
r = 1:numel(bins);
result = (r.' == repelem(r,bins)) * data .* (1./bins(:));
If you want the output as cell:
result = num2cell(result,2);
For large matrices it is better to use sparse matrix:
result = sparse(r.' == repelem(r,bins)) * data .* (1./bins(:));
Note: In previous versions of MATLAB you should use bsxfun
:
result = bsxfun(@times,bsxfun(@eq, r.',repelem(r,bins)) * data , (1./bins(:)))
Here is the result of timing for three proposed methods in Octave:
Matrix Multiplication:
0.00197697 seconds
Accumarray:
0.00465298 seconds
Cellfun:
0.00718904 seconds
EDIT : For a 200 x 100000 matrix :
Matrix Multiplication:
0.806947 seconds sparse: 0.2331 seconds
Accumarray:
0.0398011 seconds
Cellfun:
0.386079 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With