Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean of columns with same label

I have two vectors

data vector: A = [1 2 2 1 2 6; 2 3 2 3 3 5]
label vector: B = [1 2 1 2 3 NaN]

I want to take the mean of all columns that have the same label and output these as a matrix sorted by label number, ignoring NaNs. So, in this example I would want:

labelmean(A,B) = [1.5 1.5 2; 2 3 3]

This can be done with a for-loop like this.

function out = labelmean(data,label)
out=[];
for i=unique(label)
    if isnan(i); continue; end
    out = [out, mean(data(:,label==i),2)];
end 

However, I'm dealing with huge arrays containing many datapoints and labels. Additionally, this code snippet will be executed often. I'm wondering if there is a more efficient way to do this without looping over every individual label.

like image 367
Poelie Avatar asked Nov 29 '25 14:11

Poelie


1 Answers

Here's one approach:

  1. Get the indices of labels not containing NaNs.
  2. Create a sparse matrix of zeros and ones that multiplied by A would give the desired row sums.
  3. Divide that matrix by the sum of each column, so that the sums become averages.
  4. Apply matrix multiplication to get the result, and convert to a full matrix.

Code:

I = find(~isnan(B));                                 % step 1
t = sparse(I, B(I), 1, size(A,2), max(B(I)));        % step 2
t = bsxfun(@rdivide, t, sum(t,1));                   % step 3
result = full(A*t);                                  % step 4
like image 68
Luis Mendo Avatar answered Dec 02 '25 03:12

Luis Mendo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!