Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Tags:

Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds.

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_jobs=1)
knn_clf.fit(pixels, labels)

However, the following takes 1722 seconds, in other words ~64 times longer:

from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(knn_clf, pixels, labels, cv = 3, n_jobs=1)

My naive understanding is that cross_val_predict with cv=3 is doing 3-fold cross validation, so I'd expect it to fit the model 3 times, and so take at least ~3 times longer, but I don't see why it would take 64x!

To check if it was something specific to my environment, I ran the same in a Colab notebook - the difference was less extreme (15x), but still way above the ~3x I expected:

What am I missing? Why is cross_val_predict so much slower than just fitting the model?

In case it matters, I'm running scikit-learn 0.20.2.

216

asked Jan 22 '19 09:01

Dave Cahill

2 Answers

KNN is also called as lazy algorithm because during fitting it does nothing but saves the input data, specifically there is no learning at all.

During predict is the actual distance calculation happens for each test datapoint. Hence, you could understand that when using cross_val_predict, KNN has to predict on the validation data points, which makes the computation time higher!

answered Nov 15 '22 03:11

Venkatachalam

cross_val_predict does a fit and a predict so it might take longer than just fitting, but I did not expect 64 times longer

answered Nov 15 '22 04:11

Louis D.

Related questions
                            
                                Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0
                            
                                How to use if-else in pandas dataframes
                            
                                logits and labels must be broadcastable: logits_size=[32,1] labels_size=[16,1]
                            
                                Unpacking multiple lists and dictionaries as function arguments in Python 2
                            
                                Set my jupyter notebook to use python version of an enviroment
                            
                                pandas - converting d-mmm-yy to datetime object
                            
                                Pandas DataFrame change a value based on column, index values comparison
                            
                                Is there a Python API for event-driven Kafka consumer?
                            
                                Convert JSON Dictionary to JSON Array in python
                            
                                How to use F-score as error function to train neural networks?
                            
                                How to apply best fit line to time series in python
                            
                                Black background behind a figure's labels and ticks, only after saving figure but not in Python Interactive view (VS Code with Jupyter functionality)?
                            
                                How do you modify form data before saving it while using Django's CreateView?
                            
                                String Matching Using TF-IDF, NGrams and Cosine Similarity in Python
                            
                                Merge lists is multiple columns of a pandas dataframe into a sigle list in a column
                            
                                Convert column to string, retaining NaN (as None or blank)
                            
                                Annotate values for stacked horizontal bar in pandas
                            
                                calculate the time difference between two consecutive rows in pandas
                            
                                How to get the raw JSON response of a HTTP request from `driver.page_source` in Selenium webdriver Firefox
                            
                                Is there a way to add an attribute to a function as part of the function definition?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Tags:

performance

python

machine-learning

scikit-learn

cross-validation

Dave Cahill

People also ask

2 Answers

Venkatachalam

Louis D.

Recent Activity

Donate For Us