I am trying to transform two datasets: x_train and x_test using tsne. I assume the way to do this is to fit tsne to x_train, and then transform x_test and x_train. But, I am not able to transform any of the datasets.
tsne = TSNE(random_state = 420, n_components=2, verbose=1, perplexity=5, n_iter=350).fit(x_train)
I assume that tsne has been fitted to x_train.
But, when I do this:
x_train_tse = tsne.transform(x_subset)
I get:
AttributeError: 'TSNE' object has no attribute 'transform'
Any help will be appreciated. (I know I could do fit_transform
, but wouldn't I get the same error on x_test?)
Judging by the documentation of sklearn, TSNE simply does not have any transform method. Also, TSNE is an unsupervised method for dimesionality reduction/visualization, so it does not really work with a TRAIN and TEST. You simply take all of your data and use fit_transform to have the transformation and plot it.
We'll start by loading the required libraries and functions. After loading the Iris dataset, we'll get the data and label parts of the dataset. Then, we'll define the model by using the TSNE class, here the n_components parameter defines the number of target dimensions.
t-SNE [1] is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.
Check the openTSNE1 out. It has all you need.
You can also save the trained model using pickle.dump for example.
[1]: https://opentsne.readthedocs.io/en/latest/index.html
Judging by the documentation of sklearn, TSNE simply does not have any transform method.
Also, TSNE is an unsupervised method for dimesionality reduction/visualization, so it does not really work with a TRAIN and TEST. You simply take all of your data and use fit_transform to have the transformation and plot it.
EDIT - It is actually not possible to learn a transformation and reuse it on different data (i.e. Train and Test), as T-sne does not learn a mapping function on a lower dimensional space, but rather runs an iterative procedure on a subspace to find an equilibrium that minimizes a loss/distance ON SOME DATA.
Therefore if you want to preprocess and reduce dimensionality of both a Train and Test datasets, the way to go is PCA/SVD or Autoencoders. T-Sne will only help you for unsupervised tasks :)
As the accepted answer says, there is no separate transform method and it probably wouldn't work in a a train/test setting.
However, you can still use TSNE without information leakage.
Training Time Calculate the TSNE per record on the training set and use it as a feature in classification algorithm.
Testing Time Append your training and testing data and fit_transform the TSNE. Now continue on processing your test set, using the TSNE as a feature on those records.
Does this cause information leakage? No.
Inference Time
New records arrive e.g. as images or table rows.
Add the new row(s) to the training table, calculate TSNE (i.e. where the new sample sits in the space relative to your trained samples). Perform any other processing and run your prediction against the row.
It works fine. Sometimes, we worry too much about train/test split because of Kaggle etc. But the main thing is can your method be replicated at inference time and with the same expected accuracy for live use. In this case, yes it can!
Only drawback is you need your training database available at inference time and depending on size, the preprocessing might be costly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With