Mean of columns with same label

Question

I have two vectors

data vector: A = [1 2 2 1 2 6; 2 3 2 3 3 5]
label vector: B = [1 2 1 2 3 NaN]

I want to take the mean of all columns that have the same label and output these as a matrix sorted by label number, ignoring NaNs. So, in this example I would want:

labelmean(A,B) = [1.5 1.5 2; 2 3 3]

This can be done with a for-loop like this.

function out = labelmean(data,label)
out=[];
for i=unique(label)
    if isnan(i); continue; end
    out = [out, mean(data(:,label==i),2)];
end

However, I'm dealing with huge arrays containing many datapoints and labels. Additionally, this code snippet will be executed often. I'm wondering if there is a more efficient way to do this without looping over every individual label.

Luis Mendo · Accepted Answer

Here's one approach:

Get the indices of labels not containing NaNs.
Create a sparse matrix of zeros and ones that multiplied by A would give the desired row sums.
Divide that matrix by the sum of each column, so that the sums become averages.
Apply matrix multiplication to get the result, and convert to a full matrix.

Code:

I = find(~isnan(B));                                 % step 1
t = sparse(I, B(I), 1, size(A,2), max(B(I)));        % step 2
t = bsxfun(@rdivide, t, sum(t,1));                   % step 3
result = full(A*t);                                  % step 4

Mean of columns with same label

Tags:

optimization

vectorization

vector

matlab

mean

Poelie

1 Answers

Luis Mendo

Recent Activity

Donate For Us

Mean of columns with same label

Tags:

optimization

vectorization

vector

matlab

mean

Poelie

1 Answers

Luis Mendo

Related questions

Recent Activity

Donate For Us