With the dataframe underneath I want to optimize the total return, while certain bounds are satisfied.
d = {'Win':[0,0,1, 0, 0, 1, 0],'Men':[0,1,0, 1, 1, 0, 0], 'Women':[1,0,1, 0, 0, 1,1],'Matches' :[0,5,4, 7, 4, 10,13],
'Odds':[1.58,3.8,1.95, 1.95, 1.62, 1.8, 2.1], 'investment':[0,0,6, 10, 5, 25,0],}
data = pd.DataFrame(d)
I want to maximize the following equation:
totalreturn = np.sum(data['Odds'] * data['investment'] * (data['Win'] == 1))
The function should be maximized satisfying the following bounds:
for i in range(len(data)):
investment = data['investment'][i]
C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
if (lb < investment ) & (investment < ub) & (investment > C) == False:
data['investment'][i] = 0
Hereby lb
and ub
are constant for every row in the dataframe. Threshold C
however, is different for every row. Thus there are 6 parameters to be optimized: lb, ub, alph0, alpha1, alpha2, alpha3
.
Can anyone tell me how to do this in python? My proceedings so far have been with scipy (Approach1) and Bayesian (Approach2) optimization and only lb
and ub
are tried to be optimized.
Approach1:
import pandas as pd
from scipy.optimize import minimize
def objective(val, data):
# Approach 1
# Lowerbound and upperbound
lb, ub = val
# investments
# These matches/bets are selected to put wager on
tf1 = (data['investment'] > lb) & (data['investment'] < ub)
data.loc[~tf1, 'investment'] = 0
# Total investment
totalinvestment = sum(data['investment'])
# Good placed bets
data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
totalreward = sum(data['reward'])
# Return and cumalative return
data['return'] = data['reward'] - data['investment']
totalreturn = sum(data['return'])
data['Cum return'] = data['return'].cumsum()
# Return on investment
print('\n',)
print('lb, ub:', lb, ub)
print('TotalReturn: ',totalreturn)
print('TotalInvestment: ', totalinvestment)
print('TotalReward: ', totalreward)
print('# of bets', (data['investment'] != 0).sum())
return totalreturn
# Bounds and contraints
b = (0,100)
bnds = (b,b,)
x0 = [0,100]
sol = minimize(objective, x0, args = (data,), method = 'Nelder-Mead', bounds = bnds)
and approach2:
import pandas as pd
import time
import pickle
from hyperopt import fmin, tpe, Trials
from hyperopt import STATUS_OK
from hyperopt import hp
def objective(args):
# Approach2
# Lowerbound and upperbound
lb, ub = args
# investments
# These matches/bets are selected to put wager on
tf1 = (data['investment'] > lb) & (data['investment'] < ub)
data.loc[~tf1, 'investment'] = 0
# Total investment
totalinvestment = sum(data['investment'])
# Good placed bets
data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
totalreward = sum(data['reward'])
# Return and cumalative return
data['return'] = data['reward'] - data['investment']
totalreturn = sum(data['return'])
data['Cum return'] = data['return'].cumsum()
# store results
d = {'loss': - totalreturn, 'status': STATUS_OK, 'eval time': time.time(),
'other stuff': {'type': None, 'value': [0, 1, 2]},
'attachments': {'time_module': pickle.dumps(time.time)}}
return d
trials = Trials()
parameter_space = [hp.uniform('lb', 0, 100), hp.uniform('ub', 0, 100)]
best = fmin(objective,
space= parameter_space,
algo=tpe.suggest,
max_evals=500,
trials = trials)
print('\n', trials.best_trial)
Anyone knows how I should proceed? Scipy doesn't generate the desired result. Hyperopt optimization does result in the desired result. In either approach I don't know how to incorporate a boundary that is row depended (C(i)
).
Anything would help! (Any relative articles, exercises or helpful explanations about the sort of optimization are also more than welcome)
I think your formulation needs one more variable, which would be binary and would define if investment should be saved as 0 or should it have its initial value. Assuming that this variable would be saved in another column called 'new_binary', your objective function could be changed as following:
totalreturn = np.sum(data['Odds'] * data['investment'] * data['new_binary'] * data['Win'])
then, the only thing missing is introducing the variable itself.
for i in range(len(data)):
investment = data['investment'][i]
C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
data['new_binary'] = (lb < data['investment'] ) & ( data['investment'] < ub) & (data['investment'] > C)
# This should be enough to make the values in the columns binary, while in python it is easily replaced with 0 and 1.
The only problem that I see now is that this problem becomes integer, so I am not sure if scipy.optimize.minimize would do. I am not sure what could be an alternative, but according to this, PuLP and Pyomo could work.
I assume here that you cannot go through the whole dataset, or it is incomplete, or you want to extrapolate, so that you cannot calculate all combinations.
In case where you have no prior, and if you are uncertain about the smoothness, or that evaluations could be costly, I would use bayesian optimization. You can control the exploration/exploitation and prevent to get stuck in a minimum.
I would use scikit-optimize which implements bayesian optimization better IMO. They have better initialization techniques like Sobol'
method which is implemented correctly here. This ensure that you're search space will be properly sampled.
from skopt import gp_minimize
res = gp_minimize(objective, bnds, initial_point_generator='sobol')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With