I have a complex indexing problem that I'm struggling to solve in an efficient way.
The goal is to calculate the matrix w_prime
using values a combination of values from the equally sized matrices w
, dY
, and dX
.
The value of w_prime(i,j)
is calculated as mean( w( indY & indX ) )
, where indY
and indX
are the indices of dY
and dX
that are equal to i
and j
respectively.
Here's a simple implementation in matlab of an algorithm to compute w_prime
:
for i = 1:size(w_prime,1)
indY = dY == i;
for j = 1:size(w_prime,2)
indX = dX == j;
w_prime(ind) = mean( w( indY & indX ) );
end
end
This implementation is sufficient in example case below; however, in my actual use case w
, dY
, dX
are ~3000x3000
and w_prime
is ~60X900
. Meaning that each index calculation is happening on a ~9 million elements. Needless this implementation is too slow to be usable. Additionally I'll need to run this code a few dozen times.
If I want to compute w(1,1)
dY
that equal 1, save as indY
dX
that equal 1, save as indX
indY
and indX
save as ind
mean( w(ind) )
to w_prime(1,1)
I have a set points defined by two vectors X
, and T
, both are 1XN where N is ~3000. Additionally the values of X and T are integers bound by the intervals (1 60) and (1 900) respectively.
The matrices dX
and dT
, are simply distance matrices, meaning that they contain the pairwise distances between the points. Ie dx(i,j)
is equal abs( x(i) - x(j) )
.
They are calculated using: dx = pdist(x);
The matrix w
can be thought of as a weight matrix that describes how much influence one point has on another.
The purpose of calculating w_prime(a,b)
is to determine the average weight between the sub-set of points that are separated by a
in the X
dimension and b
in the T
dimension.
This can be expressed as follows:
The optimization of SQL indexes can be done by using SQL profiler, running Index Tuning Wizard, using SQL Query Analyzer and by defragmentation of indexes. For a large database, defragment the indexes is the best practice to optimize SQL server indexes.
1. A procedure to build beforehand a data structure or index designed to speed up searches. Learn more in: A Pagination Method for Indexes in Metric Databases.
When a data is inserted, a corresponding row is written to the index, and when a row is deleted, its index row is taken out. This keeps the data and searching index always in sync making the lookup very fast and read-time efficient.
This is straightforward with ACCUMARRAY:
nx = max(dX(:));
ny = max(dY(:));
w_prime = accumarray([dX(:),dY(:)],w(:),[nx,ny],@mean,NaN)
The output will be a nx
-by-ny
sized array with NaNs wherever there was no corresponding pair of indices. If you're sure that there will be a full complement of indices all the time, you can simplify the above calculation to
w_prime = accumarray([dX(:),dY(:)],w(:),[],@mean)
So, what does accumarray do? It looks at the rows of [dX(:),dY(:)]
. Each row gives the (i,j)
coordinate pair in w_prime
to which the row contributes. For all pairs (1,1)
, it applies the function (@mean
) to the corresponding entries in w(:)
, and writes the output into w_prime(1,1)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With