from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, init='pca', n_iter=5000)
print(tsne.fit_transform(np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])))
outputs:
[[ 547.9452404 11.31943926]
[-152.33035505 -223.32060683]
[ 97.57201578 84.04839505]
[-407.18939464 124.50285141]]
For the vector [1,2,3], which is repeated twice it gave different values/ vector.
Why is it so ?
Edit1:
The example given above is just a toy example to show the fact. Actually my data is numpy array of shape (500,100). Still the same problem persists.
It is an interesting question. TSNE transform the samples into a different space that preserves distances between them, but it doesn't guarantee to preserve the value of the data sample. It treats each of the samples as a different point and tries to map the distances from that point to each other sample into another space. This doesn't take into account the value of a sample, just its relative distance to every other point.
You can check that:
>>> a = np.array([[1,2,3],[3,4,3],[1,2,3],[3,3,3]])
>>> b = TSNE(n_components=2)
>>> from sklearn.metrics import euclidean_distances
>>> print(euclidean_distances(b[0], b).sum())
2498.7985853798709
>>> print(euclidean_distances(b[2], b).sum())
2475.26750924
>>> print(b)
[[-201.41082311 361.14132525]
[-600.23416334 -523.48599925]
[ 180.07532649 -288.01414955]
[ 553.42486539 538.85793453]]
It roughly preserves similar distances (considering scale) for both samples to every other sample, although having different representations for them.
About why is it working that bad for only 4 samples, my guess would be that you only have 4 samples and 3 dimensions. TSNE can't infer a proper mapping with so few samples. It is suppose to work with high dimensional data (and multiple samples of it).
For lower dimensional data I would say a simple PCA would do the job. PCA your data and keep the top 2 dimensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With