Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Letting accumarray output a table

accumarray uses two rows of indices to create a matrix with elements on the location of valid index pairs with a value assigned by the specified function, e.g.:

A = [11:20]; 
B = flipud([11:20]); 
C = 1:10;
datamatrix = accumarray([A B],C);

This way datamatrix will be a 20x20 matrix with values. If the values of A and B however are very large, this will result in a mostly empty matrix, with a small batch of data in the far corner. To circumvent this, one might set accumarray to issparse:

sparsedatamatrix = accumarray([A B],C,[],@sum,[],true);

This will save a lot of memory in case min(A) and/or min(B) is/are very large.

My problem, however, is that I have a Mx7 matrix, with M~1e8, on which I want to collect the means of columns three through seven based upon indexing in the first two columns and the standard deviation of the third column based upon the third as well:

result = accumarray([data(:,1) data(:,2)],data(:,3),[],@std);

I want to save this back into a table, structured as [X Y Z std R G B I], where X and Y are the indices, Z is the average height of that pixel,R, G, B and I are mean values (colour and intensity) per pixel and std is the standard deviation of heights (i.e. the roughness). Using the issparse in this case does not help, since I transform my matrices resulting from accumarray using repmat.

The point of this code is to estimate the height, roughness, colour and intensity of a piece of land from a point cloud. I rounded the coordinates in X and Y to create a grid and now need those average values per grid cell, but output as a "table" (not the MATLAB data type, but a 2D array which is not the default matrix output).

So, to conclude with the question:

Is there a way for accumarray or a similar function to output this table without intermediate (potentially very large) matrix?

Code below:

Xmax = max(Originaldata(:,1));
Ymax = max(Originaldata(:,2));
X_avg_grid=(Edgelength:Edgelength:Xmax*Edgelength)+Xorig;
TestSet = zeros(Xmax*Ymax,9);

xx = [1:length(X_avg_grid)]'; %#ok<*NBRAK>
TestSet(:,1) = repmat(xx,Ymax,1);
ll = 0:Xmax:Xmax*Ymax;
for jj = 1:Ymax
    TestSet(ll(jj)+1:ll(jj+1),2) = jj;
end

for ll = 1:7
    if ll == 2
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@std);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    elseif ll == 7
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],1);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    elseif ll == 1
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@mean);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    else
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,ll+1),[],@mean);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    end
end

TestSet = TestSet(~(TestSet(:,9)==0),:);

The ninth column here is just the amount of points per cell.

Originaldata = 
19  36  2.20500360107422    31488   31488   31488   31611
20  37  2.26400360107422    33792   33792   34304   33924
20  37  2.20000360107422    33536   33536   34048   33667
19  36  2.20500360107422    34560   34560   34560   34695
20  36  2.23300360107422    32512   32512   33024   32639
21  38  2.22000360107422    31744   31488   33024   31611
21  37  2.20400360107422    32512   32768   33792   32896
21  37  2.24800360107422    29696   29440   30720   29555
21  38  2.34800360107422    32768   32768   32768   32639
21  37  2.23000360107422    33024   33024   33536   33153

So all points on the same X,Y (e.g. [19 36] or [21 37]) are averaged (height, RGB, intensity in that order) and of the values in the third column the standard deviation is also desired:

Result = 
19  36  2.2050036   0.00        33024   33024   33024       33153
21  37  2.227336934 0.02212088  31744   31744   32682.66    31868

and so forth for the rest of the data.

I updated my code to the latest version I have. This reduced memory overhead quite a bit, as the function now creates the grids one after another as opposed to all at once. However, the code is running in parallel so there are still eight simultaneous grids created, so a solution would still be appreciated.

like image 781
Adriaan Avatar asked Aug 24 '15 12:08

Adriaan


3 Answers

A sketch of a solution using linear indices and 2D sparse matrix

lind = Originaldata(:,1) + max( Originaldata(:,1) ) * ( Originaldata(:,2) - 1 );
daccum(7,:) = accumarray( lind, 1, [], @sum, [], true ); %// start with last one to pre-allocate all daccum
daccum(1,:) = accumarray( lind, Originaldata(:,3), [], @mean, [], true );
daccum(2,:) = accumarray( lind, Originaldata(:,3), [], @std, [], true );
daccum(3,:) = accumarray( lind, Originaldata(:,4), [], @mean, [], true );
daccum(4,:) = accumarray( lind, Originaldata(:,5), [], @mean, [], true );
daccum(5,:) = accumarray( lind, Originaldata(:,6), [], @mean, [], true );
daccum(6,:) = accumarray( lind, Originaldata(:,7), [], @mean, [], true );

Now you can get only what you need

inter = [Originaldata(:,1), Originaldata(:,2), full( daccum(:,lind) )' ];
like image 69
Shai Avatar answered Nov 15 '22 02:11

Shai


You can first use unique with the 'rows' option to find the indices of the unique pairs of X and Y coordinates, then instead use those indices as the subscript input in your calls to accumarray (you'll have to call it separately for each column, since accumarray doesn't handle matrix inputs):

[xyPairs, ~, index] = unique(Originaldata(:, 1:2), 'rows');
nPairs = max(index);
Result = [xyPairs ...
          accumarray(index, Originaldata(:, 3), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 3), [nPairs 1], @std) ...
          accumarray(index, Originaldata(:, 4), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 5), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 6), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 7), [nPairs 1], @mean) ...
          accumarray(index, ones(size(index)), [nPairs 1], @sum)];
like image 37
gnovice Avatar answered Nov 15 '22 03:11

gnovice


You could pre-process the data.

One thing you can achieve this way is remove undesired lines (such as those having two or less occurrences) so that you don't have to deal with 0 standard deviation:

%// Count occurences:
combined_coord = Originaldata(:,1)*1E6+Originaldata(:,2); %// "concatenating" the coords
[C,~,ic] = unique(combined_coord);
occurences = [C accumarray(ic,1)];
%// Find all points that have <=2 occurences:
coords_to_remove = occurences((occurences(:,2)<=2),1);
%// Find valid lines:
valid_lns = ~sum(bsxfun(@eq,combined_coord,coords_to_remove'),2); %'
%// Filter original data:
new_data = Originaldata(valid_lns,:);
like image 1
Dev-iL Avatar answered Nov 15 '22 01:11

Dev-iL