Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does TSNE in sklearn.manifold gives different answer for same values?

from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, init='pca', n_iter=5000)

print(tsne.fit_transform(np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])))

outputs:

[[ 547.9452404    11.31943926]
 [-152.33035505 -223.32060683]
 [  97.57201578   84.04839505]
 [-407.18939464  124.50285141]]

For the vector [1,2,3], which is repeated twice it gave different values/ vector.

Why is it so ?

Edit1:

The example given above is just a toy example to show the fact. Actually my data is numpy array of shape (500,100). Still the same problem persists.

like image 844
Abhishek Avatar asked Oct 11 '25 10:10

Abhishek


1 Answers

It is an interesting question. TSNE transform the samples into a different space that preserves distances between them, but it doesn't guarantee to preserve the value of the data sample. It treats each of the samples as a different point and tries to map the distances from that point to each other sample into another space. This doesn't take into account the value of a sample, just its relative distance to every other point.

You can check that:

>>> a = np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])
>>> b = TSNE(n_components=2)
>>> from sklearn.metrics import euclidean_distances
>>> print(euclidean_distances(b[0], b).sum())
2498.7985853798709
>>> print(euclidean_distances(b[2], b).sum())
2475.26750924
>>> print(b)
[[-201.41082311  361.14132525]
 [-600.23416334 -523.48599925]
 [ 180.07532649 -288.01414955]
 [ 553.42486539  538.85793453]]

It roughly preserves similar distances (considering scale) for both samples to every other sample, although having different representations for them.

About why is it working that bad for only 4 samples, my guess would be that you only have 4 samples and 3 dimensions. TSNE can't infer a proper mapping with so few samples. It is suppose to work with high dimensional data (and multiple samples of it).

For lower dimensional data I would say a simple PCA would do the job. PCA your data and keep the top 2 dimensions.

like image 54
Imanol Luengo Avatar answered Oct 16 '25 09:10

Imanol Luengo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!