Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparse dot product in SQL

Imagine I have a table which stores a series of sparse vectors. A sparse vector means that it stores only the nonzero values explicitly in the data structure. I could have a 1 million dimensional vector, but I only store the values for the dimensions which are nonzero. So the size is proportional to the number of nonzero entries, not the dimensionality of the vector.

Table definition would be something like this: vector_id : int dimension : int value : float

Now, in normal programming land I can compute the inner product or dot product of two vectors in O(|v1| + |v2|) time. Basically the algorithm is to store the sparse vectors sorted by dimension and iterate through the dimensions in each until you find collisions between dimensions and multiply the values of the shared dimension and keep adding those up until you get to the end of either one of the vectors.

What's the fastest way to pull this off in SQL?

like image 247
Chris Harris Avatar asked Jun 29 '09 20:06

Chris Harris


1 Answers

You should be able to replicate this algorithm in one query:

select sum(v1.value * v2.value)
from vectors v1
inner join vectors v2
on v1.dimension = v2.dimension
where v1.vector_id = ...
and v2.vector_id = ...
like image 190
dpmattingly Avatar answered Sep 28 '22 02:09

dpmattingly