Is it possible to use scikit TSNE on a large sparse matrix?

Question

The scikit documentation explains fit_transform can only be used for dense matrices, but I have a sparse matrix in csr format which I want to perform tsne on. The documentation says to use the fit method for sparse matrices, but this doesn't return the low dimensional embedding.

I appreciate I could use the .todense() method as in this question, but my data set is very large (0.4*10^6 rows and 0.5*10^4 columns) so wont fit in memory. Really, it would be nice to do this using sparse matrices. Is there a way to use scikit TSNE (or any other python implementation of TSNE) to reduce the dimensionality of a large sparse matrix and return the low dimensional embedding to then visualize?

blacksite · Accepted Answer

From that same documentation:

It is highly recommended to use another dimensionality reduction method (e.g. PCA for dense data or TruncatedSVD for sparse data) to reduce the number of dimensions to a reasonable amount (e.g. 50) if the number of features is very high. This will suppress some noise and speed up the computation of pairwise distances between samples.

Use sklearn.decomposition.TruncatedSVD instead.

Is it possible to use scikit TSNE on a large sparse matrix?

Tags:

python

scikit-learn

sparse-matrix

dimensionality-reduction

PyRsquared

1 Answers

blacksite

Recent Activity

Donate For Us

Is it possible to use scikit TSNE on a large sparse matrix?

Tags:

python

scikit-learn

sparse-matrix

dimensionality-reduction

PyRsquared

1 Answers

blacksite

Related questions

Recent Activity

Donate For Us