Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use scikit TSNE on a large sparse matrix?

The scikit documentation explains fit_transform can only be used for dense matrices, but I have a sparse matrix in csr format which I want to perform tsne on. The documentation says to use the fit method for sparse matrices, but this doesn't return the low dimensional embedding.

I appreciate I could use the .todense() method as in this question, but my data set is very large (0.4*10^6 rows and 0.5*10^4 columns) so wont fit in memory. Really, it would be nice to do this using sparse matrices. Is there a way to use scikit TSNE (or any other python implementation of TSNE) to reduce the dimensionality of a large sparse matrix and return the low dimensional embedding to then visualize?

like image 767
PyRsquared Avatar asked Oct 18 '25 14:10

PyRsquared


1 Answers

From that same documentation:

It is highly recommended to use another dimensionality reduction method (e.g. PCA for dense data or TruncatedSVD for sparse data) to reduce the number of dimensions to a reasonable amount (e.g. 50) if the number of features is very high. This will suppress some noise and speed up the computation of pairwise distances between samples.

Use sklearn.decomposition.TruncatedSVD instead.

like image 128
blacksite Avatar answered Oct 21 '25 03:10

blacksite