Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to compute (1:N)'*(1:N)

Tags:

matlab

I am looking for a fast way to compute

(1:N)'*(1:N)

for reasonably large N. I feel like the symmetry of the problem makes it so that actually doing the multiplications and additions is wasteful.

like image 234
StrongBad Avatar asked Oct 01 '13 17:10

StrongBad


3 Answers

The question of why you want to do this really matters.

In the theoretical sense, the triangular approach suggested in the other answers will save you operations. @jgmao's answer is especially interesting in reducing multiplies.

In the practical sense, number of CPU operations is no longer the metric to minimize when writing fast code. Memory bandwidth dominates when you have so few CPU operations, so tuned cache-aware access patterns are how to make this go fast. Matrix multiplication code is implemented extremely efficiently, since it's such a common operation, and every implementation of the BLAS numeric library worth its salt will use optimized access patterns, and SIMD computation as well.

Even if you wrote straight C and reduced your op count to the theoretic minimum, you'd probably still not beat the full matrix multiply. What this boils down to is to find the numeric primitive which most closely matches your operation.

All that said, there's a BLAS operation which gets a little closer than DGEMM (matrix multiply). It's called DSYRK, the rank-k update, and it can be used for exactly A'*A. The MEX function I wrote for this a long time ago is here. I haven't messed with it in a long time, but it did work when I first wrote it, and did in fact run faster than a straight A'*A.

/* xtrx.c: calculates x'*x taking advantage of the symmetry.
Peter Boettcher <email removed>
Last modified: <Thu Jan 23 13:53:02 2003> */

#include "mex.h"

const double one = 1;
const double zero = 0;

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  double *x, *z;
  int i, j, mrows, ncols;

  if(nrhs!=1) mexErrMsgTxt("One input required.");

  x = mxGetPr(prhs[0]);
  mrows = mxGetM(prhs[0]);
  ncols = mxGetN(prhs[0]);

  plhs[0] = mxCreateDoubleMatrix(ncols,ncols, mxREAL);
  z = mxGetPr(plhs[0]);

  /* Call the FORTRAN BLAS routine for rank k update */
  dsyrk_("U", "T", &ncols, &mrows, &one, x, &mrows, &zero, z, &ncols);

  /* Result is in the upper triangle.  Copy it down the lower part */
  for(i=0; i<ncols; i++)
      for(j=i+1; j<ncols; j++)
          z[i*ncols + j] = z[j*ncols + i];
}
like image 61
Peter Avatar answered Sep 17 '22 21:09

Peter


MATLAB's matrix multiplication is generally pretty fast, but here are a couple of ways to get just the upper triangular matrix. They are slower than naïvely computing the v'*v (or using a MEX wrapper that calls the more appropriate symmetric rank k update function in BLAS, not surprisingly!). Anyway, here are a few MATLAB-only solutions:

The first uses linear indexing:

% test vector
N = 1e3;
v = 1:N;

% compute upper triangle of product
[ii, jj] = find(triu(ones(N)));
upperMask = false(N,N);
upperMask(ii + N*(jj-1)) = true;
Mu = zeros(N);
Mu(upperMask) = v(ii).*v(jj); % other lines always the same computation

% validate
M = v'*v;
isequal(triu(M),Mu)

This next way won't be faster than the naive approach either, but here's another solution to compute the lower triangle with bsxfun:

Ml = bsxfun(@(x,y) [zeros(y-1,1); x(y:end)*y],v',v);

For the upper triangle:

Mu = bsxfun(@(x,y) [x(1:y)*y; zeros(numel(x)-y,1)],v',v);
isequal(triu(M),Mu)

Another solution for the whole matrix using cumsum for this special case (where v=1:N). This one is actually close in speed.

M = cumsum(repmat(v,[N 1]));

Maybe these can be a starting point for something better.

like image 40
chappjc Avatar answered Sep 19 '22 21:09

chappjc


This is 3 times faster than (1:N).'*(1:N) provided an int32 result is acceptable (it's even faster if the numbers are small enough to use int16 instead of int32):

N = 1000;
aux = int32(1:N);
result = bsxfun(@times,aux.',aux);

Benchmarking:

>> N = 1000; aux = int32(1:N); tic, for count = 1:1e2, bsxfun(@times,aux.',aux); end, toc
Elapsed time is 0.734992 seconds.

>> N = 1000; aux = 1:N; tic, for count = 1:1e2, aux.'*aux; end, toc
Elapsed time is 2.281784 seconds.

Note that aux.'*aux cannot be used for aux = int32(1:N).

As pointed out by @DanielE.Shub, if the result is needed as a double matrix, a final cast has to be done, and in that case the gain is very small:

>> N = 1000; aux = int32(1:N); tic, for count = 1:1e2, double(bsxfun(@times,aux.',aux)); end, toc
Elapsed time is 2.173059 seconds.
like image 38
Luis Mendo Avatar answered Sep 19 '22 21:09

Luis Mendo