The purpose of this question is to further explore MultiIndex dataframes and to ask questions of the best approach for various tasks.
Create the DataFrame
import pandas as pd
df = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'],
'portfolio' : ['A','B','C','D','E'],
'reporting_ccy' : ['GBP','GBP','GBP','GBP','GBP'],
'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
'amount' : [100,200,300,400,500],
'injection' : [1,2,3,4,5],
'to_usd' : [1.3167,1.3167,1.3167,1.3167,1.3167],
'to_ccy' : [0.009564,1,1,1.1093,1.1093],
'm5' : [2,4,6,8,10],
'm6' : [1,3,5,7,9]});
Pivot the DataFrame
df_pivot = df.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
Rename the columns
df_pivot.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']
This yields a pivoted representation of the data such that:
I terms of 4. what is the best approach for implementation given that we have the xRates for the currencies?
Such that we create a dataframe such as that derived here:
Create DataFrame
df1 = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'],
'portfolio' : ['A','B','C','D','E'],
'reporting_ccy' : ['JPY','USD','USD','EUR','EUR'],
'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
'amount' : [13767.2522, 263.34, 395.01, 474.785901, 593.4823763],
'injection' : [1,2,3,4,5],
'to_usd' : [0.009564, 1, 1, 1.1093, 1.1093],
'to_ccy' : [1.3167, 1.3167, 1.3167, 1.3167, 1.3167],
'm5' : [2,4,6,8,10],
'm6' : [1,3,5,7,9]});
Concatenate & Pivot the DataFrames
df_concat = pd.concat([df,df1])
df_pivot1 = df_concat.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
df_pivot1.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']
This now shows 1 measure having many currencies.
df_pivot1.xs(('amount', 'A'), level=('measures','portfolio'), drop_level=False, axis=1)
Question
Is there a better way, such as adding data directly to a multiIndexed dataframe at level 3 df_pivot1.columns.get_level_values(3).unique()
?
I would like to be able to iterate through each level and add new measures either derived from other measures using df.assign()
or other methods.
The use case here is to add other currencies to the measures where applicable. The concatenation and re-pivot as above does not seem optimal.
A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways. From a list of arrays using MultiIndex.from_arrays() From an array of tuples using MultiIndex.from_tuples()
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
You can append df1 row by row onto df_pivot instead of rebuilding the pivot with both frames concat together.
Appending to the end of the DataFrame would be less memory expensive than concatenating and rebuilding the pivots from scratch every time new data is received.
import pandas as pd
df = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'],
'portfolio' : ['A','B','C','D','E'],
'reporting_ccy' : ['GBP','GBP','GBP','GBP','GBP'],
'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
'amount' : [100,200,300,400,500],
'injection' : [1,2,3,4,5],
'to_usd' : [1.3167,1.3167,1.3167,1.3167,1.3167],
'to_ccy' : [0.009564,1,1,1.1093,1.1093],
'm5' : [2,4,6,8,10],
'm6' : [1,3,5,7,9]});
# %%
df_pivot = df.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
df1 = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'],
'portfolio' : ['A','B','C','D','E'],
'reporting_ccy' : ['JPY','USD','USD','EUR','EUR'],
'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
'amount' : [13767.2522, 263.34, 395.01, 474.785901, 593.4823763],
'injection' : [1,2,3,4,5],
'to_usd' : [0.009564, 1, 1, 1.1093, 1.1093],
'to_ccy' : [1.3167, 1.3167, 1.3167, 1.3167, 1.3167],
'm5' : [2,4,6,8,10],
'm6' : [1,3,5,7,9]});
df_pivot.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']
# instead of joining the 2 df's add df1 to df_pivot 1 row at a time.
for i in range(len(df1)):
row = df1.iloc[i]
for measure in 'amount injection m5 m6 to_ccy to_usd'.split():
df_pivot.ix[row.index_date, (row.portfolio,measure,row.portfolio_ccy, row.reporting_ccy)] = row[measure]
#%% check the end result
print(df_pivot.xs(('amount', 'A'),
level=('measures','portfolio'), drop_level=False, axis=1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With