I am trying to transform two datasets: x_train and x_test using tsne. I assume the way to do this is to fit tsne to x_train, and then transform x_test and x_train. But, I am not able to transform any of the datasets. <code>tsne = TSNE(random_state = 420, n_components=2, verbose=1, perplexity=5, n_iter=350).fit(x_train)</code> I assume that tsne has been fitted to x_train. But, when I do this: <code>x_train_tse = tsne.transform(x_subset)</code> I get: <code>AttributeError: 'TSNE' object has no attribute 'transform'</code> Any help will be appreciated. (I know I could do <code>fit_transform</code>, but wouldn't I get the same error on x_test?)

As the accepted answer says, there is no separate transform method and it probably wouldn't work in a a train/test setting. However, you can still use TSNE without information leakage. Training Time Calculate the TSNE per record on the training set and use it as a feature in classification algorithm. Testing Time Append your training and testing data and fit_transform the TSNE. Now continue on processing your test set, using the TSNE as a feature on those records. Does this cause information leakage? No. Inference Time New records arrive e.g. as images or table rows. Add the new row(s) to the training table, calculate TSNE (i.e. where the new sample sits in the space relative to your trained samples). Perform any other processing and run your prediction against the row. It works fine. Sometimes, we worry too much about train/test split because of Kaggle etc. But the main thing is can your method be replicated at inference time and with the same expected accuracy for live use. In this case, yes it can! Only drawback is you need your training database available at inference time and depending on size, the preprocessing might be costly.

python tsne.transform does not exist?

Tags:

python

machine-learning

I am trying to transform two datasets: x_train and x_test using tsne. I assume the way to do this is to fit tsne to x_train, and then transform x_test and x_train. But, I am not able to transform any of the datasets.

tsne = TSNE(random_state = 420, n_components=2, verbose=1, perplexity=5, n_iter=350).fit(x_train)

I assume that tsne has been fitted to x_train.

But, when I do this:

x_train_tse = tsne.transform(x_subset)

I get:

AttributeError: 'TSNE' object has no attribute 'transform'

Any help will be appreciated. (I know I could do fit_transform, but wouldn't I get the same error on x_test?)

779

asked Dec 06 '19 13:12

NoLand'sMan

Video Answer

3 Answers

Check the openTSNE¹ out. It has all you need.

You can also save the trained model using pickle.dump for example.

[1]: https://opentsne.readthedocs.io/en/latest/index.html

answered Oct 16 '22 04:10

Payam Jome Yazdian

Judging by the documentation of sklearn, TSNE simply does not have any transform method.

enter image description here

Also, TSNE is an unsupervised method for dimesionality reduction/visualization, so it does not really work with a TRAIN and TEST. You simply take all of your data and use fit_transform to have the transformation and plot it.

EDIT - It is actually not possible to learn a transformation and reuse it on different data (i.e. Train and Test), as T-sne does not learn a mapping function on a lower dimensional space, but rather runs an iterative procedure on a subspace to find an equilibrium that minimizes a loss/distance ON SOME DATA.

Therefore if you want to preprocess and reduce dimensionality of both a Train and Test datasets, the way to go is PCA/SVD or Autoencoders. T-Sne will only help you for unsupervised tasks :)

answered Oct 16 '22 04:10

Davide ND

As the accepted answer says, there is no separate transform method and it probably wouldn't work in a a train/test setting.

However, you can still use TSNE without information leakage.

Training Time Calculate the TSNE per record on the training set and use it as a feature in classification algorithm.

Testing Time Append your training and testing data and fit_transform the TSNE. Now continue on processing your test set, using the TSNE as a feature on those records.

Does this cause information leakage? No.

Inference Time New records arrive e.g. as images or table rows.
Add the new row(s) to the training table, calculate TSNE (i.e. where the new sample sits in the space relative to your trained samples). Perform any other processing and run your prediction against the row.

It works fine. Sometimes, we worry too much about train/test split because of Kaggle etc. But the main thing is can your method be replicated at inference time and with the same expected accuracy for live use. In this case, yes it can!

Only drawback is you need your training database available at inference time and depending on size, the preprocessing might be costly.

answered Oct 16 '22 05:10

John Curry

Related questions
                            
                                Python Pandas - How to write in a specific column in an Excel Sheet
                            
                                How to generate python class files from protobuf
                            
                                Show more images in Tensorboard - Tensorflow object detection
                            
                                Find first non-zero value in each column of pandas DataFrame
                            
                                What is the best way to show data in a table in Tkinter?
                            
                                Python: Barplot with colorbar
                            
                                Scikit-learn multithreading
                            
                                Spacy - Save custom pipeline
                            
                                How to remove strings present in a list from a column in pandas
                            
                                Remove elements from one array if present in another array, keep duplicates - NumPy / Python
                            
                                Change EXIF data on JPEG without altering picture
                            
                                OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized
                            
                                Create a dataframe from arrays python
                            
                                How make simple fast request with "requests" module python?
                            
                                Different groupers for each column with pandas GroupBy
                            
                                How to use Google Cloud Firestore local emulator for python and for testing purpose
                            
                                Get continent name from country using pycountry
                            
                                Force Sphinx to interpret Markdown in Python docstrings instead of reStructuredText
                            
                                How to plot a time series graph using seaborn or plotly?
                            
                                Read Excel from S3 - AttributeError: 'StreamingBody' object has no attribute 'seek'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With