Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I optimize this indexing algorithm

My Questions

  • Is there anyway that I can speed up this calculation?
  • Is there a better algorithm or implementation that I can be use to calculate the same values?

Describing the algorithm

I have a complex indexing problem that I'm struggling to solve in an efficient way.

The goal is to calculate the matrix w_prime using values a combination of values from the equally sized matrices w, dY, and dX.

The value of w_prime(i,j) is calculated as mean( w( indY & indX ) ), where indY and indX are the indices of dY and dX that are equal to i and j respectively.

Here's a simple implementation in matlab of an algorithm to compute w_prime:

for i = 1:size(w_prime,1)
  indY = dY == i;
  for j = 1:size(w_prime,2)
    indX = dX == j; 
    w_prime(ind) = mean( w( indY & indX ) );
  end
end

Performance Problems

This implementation is sufficient in example case below; however, in my actual use case w, dY, dX are ~3000x3000 and w_prime is ~60X900. Meaning that each index calculation is happening on a ~9 million elements. Needless this implementation is too slow to be usable. Additionally I'll need to run this code a few dozen times.

Example Calculation

If I want to compute w(1,1)

  • Find the indices of dY that equal 1, save as indY
  • Find the indices of dX that equal 1, save as indX

enter image description here

  • Find intersection of indY and indX save as ind

enter image description here

  • Save the mean( w(ind) ) to w_prime(1,1)

enter image description here

General Problem Description

I have a set points defined by two vectors X, and T, both are 1XN where N is ~3000. Additionally the values of X and T are integers bound by the intervals (1 60) and (1 900) respectively.

The matrices dX and dT, are simply distance matrices, meaning that they contain the pairwise distances between the points. Ie dx(i,j) is equal abs( x(i) - x(j) ).

They are calculated using: dx = pdist(x);

The matrix w can be thought of as a weight matrix that describes how much influence one point has on another.

The purpose of calculating w_prime(a,b) is to determine the average weight between the sub-set of points that are separated by a in the X dimension and b in the T dimension.

This can be expressed as follows:

enter image description here

like image 750
slayton Avatar asked Sep 12 '12 15:09

slayton


People also ask

How do you do index optimization?

The optimization of SQL indexes can be done by using SQL profiler, running Index Tuning Wizard, using SQL Query Analyzer and by defragmentation of indexes. For a large database, defragment the indexes is the best practice to optimize SQL server indexes.

What is an indexing algorithm?

1. A procedure to build beforehand a data structure or index designed to speed up searches. Learn more in: A Pagination Method for Indexes in Metric Databases.

How does indexing make search faster?

When a data is inserted, a corresponding row is written to the index, and when a row is deleted, its index row is taken out. This keeps the data and searching index always in sync making the lookup very fast and read-time efficient.


1 Answers

This is straightforward with ACCUMARRAY:

nx = max(dX(:));
ny = max(dY(:));

w_prime = accumarray([dX(:),dY(:)],w(:),[nx,ny],@mean,NaN)

The output will be a nx-by-ny sized array with NaNs wherever there was no corresponding pair of indices. If you're sure that there will be a full complement of indices all the time, you can simplify the above calculation to

w_prime = accumarray([dX(:),dY(:)],w(:),[],@mean)

So, what does accumarray do? It looks at the rows of [dX(:),dY(:)]. Each row gives the (i,j) coordinate pair in w_prime to which the row contributes. For all pairs (1,1), it applies the function (@mean) to the corresponding entries in w(:), and writes the output into w_prime(1,1).

like image 169
Jonas Avatar answered Sep 19 '22 06:09

Jonas