Is there any Python library with parallel version of t-SNE algorithm? Or does the multicore/parallel t-SNE algorithm exist?
I'm trying to reduce dimension (300d -> 2d) of all word2vecs in my vocabulary using t-SNE.
Problem: the size of vocabulary is about 130000 and it takes too long to proceed t-SNE for them.
A second feature of t-SNE is a tuneable parameter, “perplexity,” which says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. The perplexity value has a complex effect on the resulting pictures.
While t-SNE is a dimensionality reduction technique, it is mostly used for visualization and not data pre-processing (like you might with PCA).
What is t-SNE? t-SNE is a nonlinear dimensionality reduction technique that is well suited for embedding high dimension data into lower dimensional data (2D or 3D) for data visualization.
t-Distributed Stochastic Neighbourh Embedding(t-SNE) Applies a non-linear dimensionality reduction technique where the focus is on keeping the very similar data points close together in lower-dimensional space.
Yes there is a parallel version of the barnes-hutt implementation of t-SNE. https://github.com/DmitryUlyanov/Multicore-TSNE
There is also now a new implementation of tSNE that uses a Fast-Fourier transform funciton to significantly speed up the convolution step. It also uses the ANNOY library to perform the nearest neighbours search, the default tree-based method is also there and both take advantage of parallel processing.
Original code is available here: https://github.com/KlugerLab/FIt-SNE
and an R package version here: https://github.com/JulianSpagnuolo/FIt-SNE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With