Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to optimize function parameters?

Background:

I'd like to solve a wide array of optimization problems such as asset weights in a portfolio, and parameters in trading strategies where the variables are passed to functions containing a bunch of other variables as well.

Until now, I've been able to do these things easily in Excel using the Solver Add-In. But I think it would be much more efficient and even more widely applicable using Python. For the sake of clarity, I'm going to boil the question down to the essence of portfolio optimization.

My question (short version):

Here's a dataframe and a corresponding plot with asset returns.

Dataframe 1:

                A1      A2
2017-01-01  0.0075  0.0096
2017-01-02 -0.0075 -0.0033
.
.
2017-01-10  0.0027  0.0035

Plot 1 - Asset returns

enter image description here

Based on that, I would like to find the weights for the optimal portfolio with regards to risk / return (Sharpe ratio), represented by the green dot in the plot below (the red dot is the so-called minimum variance portfolio, and represents another optimization problem).

Plot 2 - Efficient frontier and optimal portfolios:

enter image description here

How can I do this with numpy or scipy?

The details:

The following code section contains the function returns() to build a dataframe with random returns for two assets, as well as a function pf_sharpe to calculate the Sharpe ratio of two given weights for a portfolio of the returns.

# imports
import pandas as pd
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

np.random.seed(1234)

# Reproducible data sample
def returns(rows, names):
    ''' Function to create data sample with random returns
    
    Parameters
    ==========
    rows : number of rows in the dataframe
    names: list of names to represent assets
    
    Example
    =======
    
    >>> returns(rows = 2, names = ['A', 'B'])
    
                  A       B
    2017-01-01  0.0027  0.0075
    2017-01-02 -0.0050 -0.0024
    '''
    listVars= names
    rng = pd.date_range('1/1/2017', periods=rows, freq='D')
    df_temp = pd.DataFrame(np.random.randint(-100,100,size=(rows, len(listVars))), columns=listVars) 
    df_temp = df_temp.set_index(rng)
    df_temp = df_temp / 10000

    return df_temp


# Sharpe ratio
def pf_sharpe(df, w1, w2):
    ''' Function to calculate risk / reward ratio
        based on a pandas dataframe with two return series
    
    Parameters
    ==========
    df : pandas dataframe
    w1 : portfolio weight for asset 1
    w2 : portfolio weight for asset 2
    
    '''
    
    weights = [w1,w2]      
    
    # Calculate portfolio returns and volatility
    pf_returns = (np.sum(df.mean() * weights) * 252)
    pf_volatility = (np.sqrt(np.dot(np.asarray(weights).T, np.dot(df.cov() * 252, weights))))
       
    # Calculate sharpe ratio
    pf_sharpe = pf_returns / pf_volatility
    
    return pf_sharpe

# Make df with random returns and calculate
# sharpe ratio for a 80/20 split between assets
df_returns = returns(rows = 10, names = ['A1', 'A2'])
df_returns.plot(kind = 'bar')

sharpe = pf_sharpe(df = df_returns, w1 = 0.8, w2 = 0.2)
print(sharpe)

# Output:
# 5.09477512073

Now I'd like to find the portfolio weights that optimize the Sharpe ratio. I think you could express the optimization problem as follows:

maximize:
    pf_sharpe()

by changing:
    w1, w2

under the constraints:
    0 < w1 < 1
    0 < w2 < 1
    w1 + w2 = 1

What I've tried so far:

I found a possible setup in the post Python Scipy Optimization.minimize using SLSQP showing maximized results. Below is what I have so far, and it addresses a central aspect of my question directly:

[...]where the variables are passed to functions containing a bunch of other variables as well.

As you can see, my initial challenge prevents me from even testing if my bounds and constraints will be accepted by the function optimize.minimize(). I haven't even bothered to take into consideration the fact that this is a maximization and not a minimization problem (hopefully amendable by changing the sign of the function).

Attempts:

# bounds
b = (0,1)
bnds = (b,b)

# constraints
def constraint1(w1,w2):
    return w1 - w2

cons = ({'type': 'eq', 'fun':constraint1})

# initial guess
x0 = [0.5, 0.5]

# Testing the initial guess
print(pf_sharpe(df = df_returns, weights = x0))

# Optimization attempts

attempt1 = optimize.minimize(pf_sharpe(), x0, method = 'SLSQP', bounds = bnds, constraints = cons)
attempt2 = optimize.minimize(pf_sharpe(df = df_returns, weights),  x0, method = 'SLSQP', bounds = bnds, constraints = cons)
attempt3 = optimize.minimize(pf_sharpe(weights, df = df_returns), x0, method = 'SLSQP', bounds = bnds, constraints = cons)

Results:

  • Attempt1 is closest to the scipy setup here, but understandably fails because neither df nor weights have been specified.
  • Attempt2 fails with SyntaxError: positional argument follows keyword argument
  • Attempt3 fails with NameError: name 'weights' is not defined

I was under the impression that df could freely be specified, and that x0 in optimize.minimize would be considered the variables to be tested as 'representatives' for the weights in the function specified by pf_sharpe().

As you surely understand, my transition from Excel to Python in this regard has not been the easiest, and there is plenty I don't understand here. Anyway, I'm hoping some of you may offer some suggestions or clarifications!

Thank you!

Appendix 1 - Simulation approach:

This particular portfolio optimization problem can easily be solved by simulating a bunch of portfolio weights. And I did exactly that to produce the portfolio plot above. Here's the whole function if anyone is interested:

# Portfolio simulation
def portfolioSim(df, simRuns):
    ''' Function to take a df with asset returns,
        runs a number of simulated portfolio weights,
        plots return and risk for those weights,
        and finds minimum risk portfolio
        and max risk / return portfolio
    
    Parameters
    ==========
    df : pandas dataframe with returns
    simRuns : number of simulations
    
    '''  
    prets = []
    pvols = []
    pwgts = []
    names = list(df_returns)
    
    for p in range (simRuns):
        
        # Assign random weights
        weights = np.random.random(len(list(df_returns)))
        weights /= np.sum(weights)
        weights = np.asarray(weights)        
    
        # Calculate risk and returns with random weights
        prets.append(np.sum(df_returns.mean() * weights) * 252)
        pvols.append(np.sqrt(np.dot(weights.T, np.dot(df_returns.cov() * 252, weights))))
        pwgts.append(weights)
            
    prets = np.array(prets)
    pvols = np.array(pvols)
    pwgts = np.array(pwgts)
    pshrp = prets / pvols
    
    # Store calculations in a df
    df1 = pd.DataFrame({'return':prets})         
    df2 = pd.DataFrame({'risk':pvols})    
    df3 = pd.DataFrame(pwgts)
    df3.columns = names
    df4 = pd.DataFrame({'sharpe':pshrp})
    df_temp = pd.concat([df1, df2, df3, df4], axis = 1)
    
    # Plot resulst
    plt.figure(figsize=(8, 4))
    plt.scatter(pvols, prets, c=prets / pvols, cmap = 'viridis', marker='o')
    
    # Min risk
    min_vol_port = df_temp.iloc[df_temp['risk'].idxmin()]   
    plt.plot([min_vol_port['risk']], [min_vol_port['return']], marker='o', markersize=12, color="red")
    
    # Max sharpe
    max_sharpe_port = df_temp.iloc[df_temp['sharpe'].idxmax()]    
    plt.plot([max_sharpe_port['risk']], [max_sharpe_port['return']], marker='o', markersize=12, color="green")

# Test run
portfolioSim(df = df_returns, simRuns = 250)

Appendix 2 - Excel Solver approach:

Here is how I would approach the problem using Excel Solver. Instead of linking to a file, I've only attached a screenshot and included the most important formulas in a code section. I'm guessing not many of you is going to be interested in reproducing this anyway. But I've included it just to show that it can be done quite easily in Excel. Grey ranges represent formulas. Ranges that can be changed and used as arguments in the optimization problem are highlighted in yellow. The green range is the objective function.

Here's an image of the worksheet and Solver setup:

enter image description here

Excel formulas:

C3  =AVERAGE(C7:C16)
C4  =AVERAGE(D7:D16)
H4  =COVARIANCE.P(C7:C16;D7:D16)
G5  =COVARIANCE.P(C7:C16;D7:D16)
G10 =G8+G9
G13 =MMULT(TRANSPOSE(G8:G9);C3:C4)
G14 =SQRT(MMULT(TRANSPOSE(G8:G9);MMULT(G4:H5;G8:G9)))
H13 =G12/G13
H14 =G13*252
G16 =G13/G14
H16 =H13/H14

End notes:

As you can see from the screenshot, Excel solver suggests a 47% / 53% split between A1 and A2 to obtain an optimal Sharpe Ratio of 5,6. Running the Python function sr_opt = portfolioSim(df = df_returns, simRuns = 25000) yields a Sharpe Ratio of 5,3 with corresponding weights of 46% and 53% for A1 and A2:

print(sr_opt)
#Output
#return    0.361439
#risk      0.067851
#A1        0.465550
#A2        0.534450
#sharpe    5.326933

The method applied in Excel is GRG Nonlinear. I understand that changing the SLSQP argument to a non-linear method would get me somewhere, and I've look into Nonlinear solvers in scipy as well, but with little success. And maybe Scipy even isn't the best option here?

like image 379
vestland Avatar asked Apr 09 '18 11:04

vestland


People also ask

How do you minimize a function in Python?

To minimize the function we can use "scipy. optimize. minimize" function and further there are some methods we can use to minimize the function. Build a Chatbot in Python from Scratch!

Does SciPy have a maximize function?

SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear problems (with support for both local and global optimization algorithms), linear programing, constrained and nonlinear least-squares, root finding, and curve fitting.


1 Answers

A more detailed answer, 1st part of your code remains the same

import pandas as pd
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

np.random.seed(1234)

# Reproducible data sample
def returns(rows, names):
    ''' Function to create data sample with random returns

    Parameters
    ==========
    rows : number of rows in the dataframe
    names: list of names to represent assets

    Example
    =======

    >>> returns(rows = 2, names = ['A', 'B'])

                  A       B
    2017-01-01  0.0027  0.0075
    2017-01-02 -0.0050 -0.0024
    '''
    listVars= names
    rng = pd.date_range('1/1/2017', periods=rows, freq='D')
    df_temp = pd.DataFrame(np.random.randint(-100,100,size=(rows, len(listVars))), columns=listVars) 
    df_temp = df_temp.set_index(rng)
    df_temp = df_temp / 10000

    return df_temp

The function pf_sharpe is modified, the 1st input is one of the weights, the parameter to be optimised. Instead of inputting constraint w1 + w2 = 1, we can define w2 as 1-w1 inside pf_sharpe, which is perfectly equivalent but simpler and faster. Also, minimize will attempt to minimize pf_sharpe, and you actually want to maximize it, so now the output of pf_sharpe is multiplied by -1.

# Sharpe ratio
def pf_sharpe(weight, df):
    ''' Function to calculate risk / reward ratio
        based on a pandas dataframe with two return series
    '''   
    weights = [weight[0], 1-weight[0]]
    # Calculate portfolio returns and volatility
    pf_returns = (np.sum(df.mean() * weights) * 252)
    pf_volatility = (np.sqrt(np.dot(np.asarray(weights).T, np.dot(df.cov() * 252, weights))))

    # Calculate sharpe ratio
    pf_sharpe = pf_returns / pf_volatility

    return -pf_sharpe

# initial guess
x0 = [0.5]

df_returns = returns(rows = 10, names = ['A1', 'A2'])

# Optimization attempts

out = minimize(pf_sharpe, x0, method='SLSQP', bounds=[(0, 1)], args=(df_returns,))

optimal_weights = [out.x, 1-out.x]
print(optimal_weights)
print(-pf_sharpe(out.x, df_returns))

This returns an optimized Sharpe Ratio of 6.16 (better than 5.3) for w1 practically one and w2 practically 0

like image 200
Brenlla Avatar answered Oct 25 '22 00:10

Brenlla