My problem is quite common in finance.
Given an array w (1xN) of weights and a covariance matrix Q (NxN) of assets, one can calculate the covariance of the portfolio using the quadratic expression w' * Q * w, where * is the dot product.
I want to understand what is the best way to perform this operation when I have an history of weights W (T x N) and a 3D structure for covariance matrix (T, N, N).
import numpy as np
import pandas as pd
returns = pd.DataFrame(0.1 * np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
covariance = returns.rolling(20).cov()
weights = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
My solution so far was to converting pandas DataFrames to numpy, perform the calculation doing a loop and then converting back to pandas. Note that I need to explicitly check for the alignment of labels, since in reality covariance and weights could be calculated by different processes.
cov_dict = {key: covariance.xs(key, axis=0, level=0) for key in covariance.index.get_level_values(0)}
def naive_numpy(weights, cov_dict):
expected_risk = {}
# Extract columns, index before passing to numpy arrays
# Columns
cov_assets = cov_dict[next(iter(cov_dict))].columns
avail_assets = [el for el in cov_assets if el in weights]
# Indexes
cov_dates = list(cov_dict.keys())
avail_dates = weights.index.intersection(cov_dates)
sel_weights = weights.loc[avail_dates, avail_assets]
# Main loop and calculation
for t, value in zip(sel_weights.index, sel_weights.values):
expected_risk[t] = np.sqrt(np.dot(value, np.dot(cov_dict[t].values, value)))
# Back to pandas DataFrame
expected_risk = pd.Series(expected_risk).reindex(weights.index).sort_index()
return expected_risk
Is there pure-pandas way to achieve the same result? Or is there any improvement on the code to make it more efficient? (despite using numpy, it is still quite slow).
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
I think numpy is definitely the best option. Though you loose that efficiency if you loop on values/dates.
My suggestion for calculating the rolling volatility of a portfolio (with no looping):
returns = pd.DataFrame(0.1 * np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
covariance = returns.rolling(20).cov()
weights = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
rows, columns = weights.shape
# Go to numpy:
w = weights.values
cov = covariance.values.reshape(rows, columns, columns)
A = np.matmul(w.reshape(rows, 1, columns), cov)
var = np.matmul(A, w.reshape(rows, columns, 1)).reshape(rows)
std_dev = np.sqrt(var)
# Back to pandas (in case you want that):
pd.Series(std_dev, index = weights.index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With