Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomized SVD for LSA\LSI on Windows environment

Tags:

c#

svd

lsa

I am working on a project which includes the use of latent semantic analysis (LSA). This requires the usage of singular value decomposition (SVD), sometimes on large data sets. Is there an implementation of randomized-SVD (rSVD) available for Windows\Visual Studio environment? I saw a project called redsvd but it seems that it is supported on Linux only.

like image 943
Leeor Avatar asked Jun 09 '13 07:06

Leeor


1 Answers

ILNumerics might have it but I didn't see whether they do rSVD and I have no personal experience with the library but it is available through NuGet fortunately.

http://ilnumerics.net

Here are the docs on their SVD implementation:

http://ilnumerics.net/apidoc/Index.html?topic=html/Overload_ILNumerics_ILMath_svd.htm

There is also NAG, but its paid: http://www.nag.co.uk/numeric/numerical_libraries.asp

I also checked out redsvd, and I bet I could either port it to C# for you or at the very least get it to compile on windows. If those don't meet your needs let me know and I'll take a look into the complexity of the port.

UPDATE:

Well got home tonight and decided to give it a shot. Here's a really quick way to get redsvd working on Windows using Visual Studio 2010. I posted it on github:

https://github.com/hoonto/redsvdwin

Open up the rsvd3.sln in Visual Studio, build it, and you'll get a rsvd3.exe in the Debug directory.

Run that:

C:\Users\MLM\Documents\Visual Studio 2010\Projects\redsvdwin\Debug>rsvd3.exe
usage: redsvd --input=string --output=string [options] ...

redsvd supports the following format types (one line for each row)

[format=dense] (<value>+\n)+
[format=sparse] ((colum_id:value)+\n)+
Example:
>redsvd -i imat -o omat -r 10 -f dense
compuate SVD for a dense matrix in imat and output omat.U omat.V, and omat.S
with the 10 largest eigen values/vectors
>redsvd -i imat -o omat -r 3 -f sparse -m PCA
compuate PCA for a sparse matrix in imat and output omat.PC omat.SCORE
with the 3 largest principal components

options:
  -i, --input     input file (string)
  -o, --output    output file's prefix (string)
  -r, --rank      rank       (int [=10])
  -f, --format    format type (dense|sparse) See example.  (string [=dense])
  -m, --method    method (SVD|PCA|SymEigen) (string [=SVD])

And there it is. By the way, this builds the redsvdMain.cpp, if you wanted the Incr file with main it, exclude redsvdMain.cpp and include redsvdMainIncr.cpp. Since both have main's in them I just excluded the Incr version and built the regular version.

Also, I included the Eigen3 headers in the github repository as well and put them in the Additional Include's for the solution configuration, so you don't need to fiddle with that at all.

One last thing, there is no such thing as cxxabi.h to my knowledge for Visual Studio, so I did some cheating, you'll see where I've made the changes because they'll be commented like so:

//MLM: commented next 3
//...
//...
//...
//MLM: added 1
...

and so forth. So if you need to make adjustments, you'll know where my changes are.

like image 169
Matt Mullens Avatar answered Oct 02 '22 19:10

Matt Mullens