The scikit documentation explains fit_transform
can only be used for dense matrices, but I have a sparse matrix in csr format which I want to perform tsne on. The documentation says to use the fit
method for sparse matrices, but this doesn't return the low dimensional embedding.
I appreciate I could use the .todense()
method as in this question, but my data set is very large (0.4*10^6 rows and 0.5*10^4 columns) so wont fit in memory. Really, it would be nice to do this using sparse matrices. Is there a way to use scikit TSNE (or any other python implementation of TSNE) to reduce the dimensionality of a large sparse matrix and return the low dimensional embedding to then visualize?
From that same documentation:
It is highly recommended to use another dimensionality reduction method (e.g. PCA for dense data or TruncatedSVD for sparse data) to reduce the number of dimensions to a reasonable amount (e.g. 50) if the number of features is very high. This will suppress some noise and speed up the computation of pairwise distances between samples.
Use sklearn.decomposition.TruncatedSVD
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With