I have two vectors
data vector: A = [1 2 2 1 2 6; 2 3 2 3 3 5]
label vector: B = [1 2 1 2 3 NaN]
I want to take the mean of all columns that have the same label and output these as a matrix sorted by label number, ignoring NaNs. So, in this example I would want:
labelmean(A,B) = [1.5 1.5 2; 2 3 3]
This can be done with a for-loop like this.
function out = labelmean(data,label)
out=[];
for i=unique(label)
if isnan(i); continue; end
out = [out, mean(data(:,label==i),2)];
end
However, I'm dealing with huge arrays containing many datapoints and labels. Additionally, this code snippet will be executed often. I'm wondering if there is a more efficient way to do this without looping over every individual label.
Here's one approach:
NaNs.A would give the desired row sums.Code:
I = find(~isnan(B)); % step 1
t = sparse(I, B(I), 1, size(A,2), max(B(I))); % step 2
t = bsxfun(@rdivide, t, sum(t,1)); % step 3
result = full(A*t); % step 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With