I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
AxB
matrix where A
is number of samples and B
is the number of dimensions, e.g. 100x5000
. And I want to reduce the dimension of it to 500
, which will produce a 100x500
matrix.As far as I understand: first, I need to construct a 100x500
matrix and fill the entries randomly with +1
and -1
(with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A
which is mxn
. We want to reduce it to E
which is mxk
.
What we need to do is, to construct a matrix R
which has nxk
dimension, and fill it with 0
, -1
or +1
, with respect to 2/3
, 1/6
and 1/6
probability.
After constructing this R
, we'll simply do a matrix multiplication AxR
to find our reduced matrix E
. But we don't need to do a full matrix multiplication, because if an element of Ri
is 0
, we don't need to do calculation. Simply skip it. But if we face with 1
, we just add the column, or if it's -1
, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E
. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With