Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()'

Tags:

Google Cloud ML-engine supports the ability to deploy scikit-learn Pipeline objects. For example a text classification Pipeline could look like the following,

classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', naive_bayes.MultinomialNB())])

The classifier can be trained,

classifier.fit(train_x, train_y)

Then the classifier can be uploaded to Google Cloud Storage,

model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)

Then a Model and Version can be created, either through the Google Cloud Console, or programmatically, linking the 'model.joblib' file to the Version.

This classifier can then be used to predict new data by calling the deployed model predict endpoint,

ml = discovery.build('ml','v1')
project_id = 'projects/{}/models/{}'.format(project_name, model_name)
if version_name is not None:
    project_id += '/versions/{}'.format(version_name)
request_dict = {'instances':['Test data']}
ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()

The Google Cloud ML-engine calls the predict function of the classifier and returns the predicted class. However, I would like to be able to return the confidence score. Normally this could be achieved by calling the predict_proba function of the classier, however there doesn't seem to be the option to change the called function. My question is: Is it possible to return the confidence score for a scikit-learn classifier when using the Google Cloud ML-engine? If not, would you have any recommendations as to how else to achieve this result?

Update: I've found a hacky solution. It involved overwriting the predict function of the classifier with its own predict_proba function,

nb = naive_bayes.MultinomialNB()
nb.predict = nb.predict_proba
classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', nb)])

Surprisingly this works. If anyone knows of a neater solution then please let me know.

Update: Google have released a new feature (currently in beta) called Custom prediction routines. This allows you to define what code is run when a prediction request comes in. It adds more code to the solution, but it certainly less hacky.

625

asked Sep 03 '18 14:09

Alex Morgan

1 Answers

The ML Engine API you are using, only has the predict method, as you can see in the documentation, so it will only do the prediction (unless you force it to do something else with the hack you mentioned).

If you want to do something else with your trained model, you’ll have to load it and use it normally. If you want to use the model stored in Cloud Storage you can do something like:

from google.cloud import storage
from sklearn.externals import joblib

bucket_name = "<BUCKET_NAME>"
gs_model = "path/to/model.joblib"  # path in your Cloud Storage bucket
local_model = "/path/to/model.joblib"  # path in your local machine

client = storage.Client()
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gs_model)
blob.download_to_filename(local_model)

model = joblib.load(local_model)
model.predict_proba(test_data)

answered Oct 15 '22 09:10

rilla

Related questions
                            
                                TensorFlow: Performing this loss computation
                            
                                Django Proxy Field
                            
                                Can we use serializer_class attribute with APIView(django rest framework)?
                            
                                How to save plots from multiple python scripts using an interactive C# process command?
                            
                                Python scraping of javascript web pages fails for https pages only
                            
                                Providing SSL Connections in Python using PKCS#11
                            
                                Efficient way to set elements to zero where mask is True on scipy sparse matrix
                            
                                Pandas uses substantially more memory for storage than asked for
                            
                                Debugging a Neural Network
                            
                                Numpy Apply Along Axis and Get Row Index
                            
                                (Installing Python 3.6.1) SSLError: SSL: TLSV1_ALERT_UNKNOWN_CA tlsv1 alert unknown ca
                            
                                Text[Multi-Level] Classification with many outputs
                            
                                Temporary images with Pyglet
                            
                                How to use the latest sqlite3 version in python
                            
                                Proxy Pooling System for Scrapy to temporarily stop using slow/timing out proxies
                            
                                How to use py_func with a function that returns dict
                            
                                What does "Broker transport failure" mean in kafka?
                            
                                Weird behaviour with groupby on ordered categorical columns
                            
                                Simulation of t copula in Python
                            
                                Showing cropped image in bokeh

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()'

Tags:

python

google-cloud-platform

scikit-learn

google-cloud-ml

Alex Morgan

People also ask

1 Answers

rilla

Recent Activity

Donate For Us