Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I average every n columns of each row in a matrix?

So I have a monthly returns matrix, in the form of 1000x300. I would like to take the average values of the every 12 columns for each row in the returns matrix to give me annual return, which would eventually lead to a 1000x25 matrix.

How would I go about doing this in Matlab?

Through some quick searching, I believe I can use the reshape function somehow, but I am having trouble figuring out how to implement it in my code's loop.

So far, this is my attempt.

for i = 1:25
Strategy1.MeanReturn(:,i) = mean(Data.Return(:,i+1):Data.Return(:,i*12+1));
end

Fyi, the +1 is there because I am ignoring the first column of the matrix.

But this leads me to getting a singular NaN value.

like image 967
rahulk92 Avatar asked Jun 02 '16 06:06

rahulk92


3 Answers

You can stack the desired submatrices along the first dimension of a 3D array, then do the average along that dimension, and squeeze out the resulting singleton dimension:

x = rand(10,20); % example data. 1000x300 in your case
N = 4; % group size. 12 in your case
y = reshape(x.', N, size(x,2)/N, []);
result = squeeze(mean(y,1)).';
like image 74
Luis Mendo Avatar answered Oct 06 '22 01:10

Luis Mendo


try this:

B = zeros(1000,25);
A = rand(1000,300);
for i = 1:25    
    B(:,i) = mean(A(:,(i-1)*12+1:i*12),2); 
end

I just tested it with building a sum of ones and it worked.

like image 44
bushmills Avatar answered Oct 05 '22 23:10

bushmills


Loops aren't always slow. In fact, tests performed by Mathworks has shown that the speed of loops has improved by 40% as a result of the new and improved Execution Engine (JIT)

The average performance improvement across all tests was 40%. Tests consisted of code that used a range of MATLAB products. Although not all applications ran faster with the redesign, the majority of these applications ran at least 10% faster in R2015b than in R2015a.

and

The performance benefit of JIT compilation is greatest when MATLAB code is executed additional times and can re-use the compiled code. This happens in common cases such as for-loops or when applications are run additional times in a MATLAB session


A quick benchmark of the three solutions:

%% bushmills answer, saved as bushmills.m
function B = bushmills(A,N)
B = zeros(size(A,1),size(A,2)/N);
for i = 1:size(A,2)/N   
    B(:,i) = mean(A(:,(i-1)*12+1:i*12),2); 
end
end

A = rand(1000,300); N = 12;

%% Luis Mendo's answer:
lmendo = @(A,N) squeeze(mean(reshape(x.', N, size(x,2)/N, []))).';

%% Divakar's answer:
divakar = @(A,N) reshape(mean(reshape(A,size(A,1),N,[]),2),size(A,1),[]);

b = @() bushmills(A,N);
l = @() lmendo(A,N);
d = @() divakar(A,N);

sprintf('Bushmill: %d\nLuis Mendo: %d\nDivakar: %d', timeit(b), timeit(l), timeit(d))
ans =
Bushmill: 1.102774e-03
Luis Mendo: 1.611329e-03
Divakar: 1.888878e-04

sprintf('Relative to fastest approach:\nDivakar: %0.5f\nBushmill: %0.5f\nLuis Mendo: %0.5f', 1, tb/td, tl/td)
ans =
Relative to fastest approach:
Divakar: 1.00000
Bushmill: 5.34464
Luis Mendo: 10.73969

The loop approach (with pre-allocation) is approximately 40% faster than the squeeze(mean(reshape(...))) solution. Divakar's solution beats both by a mile.


It might be different for other values of A and N, but I haven't tested all.

like image 37
Stewie Griffin Avatar answered Oct 06 '22 01:10

Stewie Griffin