Scikit-learn custom score function needs values from dataset other than X and y

Tags:

I'm trying to evaluate a model based on its performance on historical sports beting.

I have a dataset that consists of the following columns:

feature1 | ... | featureX | oddsPlayerA | oddsPlayerB | winner

The model will be doing a regression where the output is the odds that playerA wins the match

It is my understanding that I can use a custom scoring function to return the "money" the model would have made if it bet every time a condition is true and use that value to measure the fitness of the model. A condition something like:

if prediction_player_A_win_odds < oddsPlayerA
   money += bet_playerA(oddsPlayerA, winner) 
if inverse_odd(prediction_player_A_win_odds) < oddsPlayerB
   money += bet_playerB(oddsPlayerB, winner)

In the custom scoring function I need to receive the usual arguments like "ground_truth, predictions" (where ground_truth is the winner[] and predictions is prediction_player_A_win_odds[]) but also the fields "oddsPlayerA" and "oddsPlayerB" from the dataset (and here is the problem!).

If the custom scoring function was called with data in the exact same order as the original dataset it would be trivial to retrieve this extra data needed from the dataset. But in reality when using cross validation methods the data it gets is all mixed up (when compared to the original).

I've tried the most obvious approach which was to pass the y variable with [oddsA, oddsB, winner] (dimensions [n, 3]) but scikit didn't allow it.

So, how can I get data from the dataset into the custom scoring function that is neither X nor y but is still "tied together" in the same order?

684

asked Nov 03 '14 00:11

joaoroque

1 Answers

There is no way to actually do this at the moment, sorry. You can write your own loop over the cross-validation folds, which should not be to hard. You can not do this using GridSearchCV or cross_val_score

173

answered Nov 02 '22 23:11

Andreas Mueller

Related questions
                            
                                Fast algorithm to compute Adamic-Adar
                            
                                Multiprocessing : NULL result without error in PyObject_Call
                            
                                Why is numpy.random.binomial(1, nan) = -9223372036854775807?
                            
                                Different behaviour of hexbin and histogram2d
                            
                                Using django-dynamic-formset with CreateWithInlinesView from django-extra-views - multiple formsets
                            
                                Is there way to check feature deprecation against django version?
                            
                                Django, ajax populate form with model data
                            
                                Pandas to D3. Serializing dataframes to JSON
                            
                                python / django - bidi brackets issue in html select list
                            
                                Can I make Django QueryDict preserve ordering?
                            
                                Is there a simple way to add a border to Kivy Labels, Buttons, Widgets etc. with-out images?
                            
                                How to write utf8 to standard output in a way that works with python2 and python3
                            
                                Python PIL: Blend transparent image onto another
                            
                                Python Flask get json data to display
                            
                                Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)
                            
                                Mapping from a node's name to its index and vice versa in networkx
                            
                                Table(Model) Inheritance with Flask SQLAlchemy
                            
                                How to retrieve function call argument values using libclang
                            
                                Why does Fraction use __new__ instead of __init__?
                            
                                Pandas, groupby and finding maximum in groups, returning value and count

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn custom score function needs values from dataset other than X and y

Tags:

python

scikit-learn

regression

scoring

joaoroque

People also ask

1 Answers

Andreas Mueller

Recent Activity

Donate For Us