My dataframe is as below,
_dict = {'t_head': ['H1', 'H2', 'H3', 'H4', 'H5','H6'],
'r_head': ['Revenue', 'Revenue', 'Income', 'Income', 'Cash', 'Expenses'],
'3ME__ Q219': [159.9, '', 45.6, '', '', ''],
'3ME__ Q218': [112.3, '', 27.2, '', '', ''],
'3ME__ Q119': [121.0, '', 23.1, '', '', ''],
'3ME__ Q18': [85.7, '', 15.3, '', '', ''],
'3ME__ Q418': [160.5, '', 51.1, '', '', ''],
'9ME__ Q417': [102.6, '', 24.2, '', '', ''],
'9ME__ Q318': [118.8, '', 30.2, '', '', ''],
'9ME__ Q317': [79.4, '', 15.3, '', '', ''],
'6ME__ Q219': ['', 280.9, '', 68.7, '', ''],
'6ME__ Q218': ['', 198.0, '', 42.6, '', ''],
'Q219': ['', '', '', '', 1305, 1239],
'Q418': ['', '', '', '', 2072, 1117]
}
df = pd.DataFrame.from_dict(_dict)
print(df)
t_head r_head 3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418 9ME__ Q417 9ME__ Q318 9ME__ Q317 6ME__ Q219 6ME__ Q218 Q219 Q418
0 H1 Revenue 159.9 112.3 121 85.7 160.5 102.6 118.8 79.4
1 H2 Revenue 280.9 198
2 H3 Income 45.6 27.2 23.1 15.3 51.1 24.2 30.2 15.3
3 H4 Income 68.7 42.6
4 H5 Cash 1305 2072
5 H6 Expenses 1239 1117
I want to split this dataframe into multiple dtaframes base on column heading. Here column headings can start with 3ME__,6ME__,9ME__ (all/any/none can be present) or other values. i want to all columns starting with 3ME__ to be in one dataframe,6ME__ to another...etc. and the all of the rest to be in a fourth dataframe.
What i had tried is as below,
df1 = df.filter(regex='3ME__')
if not df1.empty:
df1 = df1[df1.iloc[:,0].astype(bool)]
df2 = df.filter(regex='6ME__')
if not df2.empty:
df2 = df2[df2.iloc[:,0].astype(bool)]
df3 = df.filter(regex='9ME__')
if not df3.empty:
df3 = df3[df3.iloc[:,0].astype(bool)]
Here i am able to filter out column names starting with3ME__,6ME__ & 9ME__ to different dataframes, but not able to get the rest of column headings to one dataframe.
1.) How to get the rest of column headings to one dataframe?
2.) Is there any simpler method to split into dictionary with a key and dataframes as values?
Please Help.
I would do the below:
m=df.set_index(['t_head','r_head']) #set the 2 columns as index
Then split columns and group by on axis 1 and make a dict with each group
d={f'df_{i}': g for i, g in m.groupby(m.columns.str.split('_').str[0],axis=1)}
Then call each key to access this dictionary:
print(d['df_3ME'])
Based on further discussion we do the same operation but with a condition:
cond=df.columns.str.contains('__') #check if cols have double _
d={f'df_{i}':g for i, g in
df.loc[:,cond].groupby(df.loc[:,cond].columns.str.split('__').str[0],axis=1)}
d.update({'Misc':df.loc[:,~cond]}) #update the dict with all that doesnt meet condition
print(d['df_3ME'])
3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418
0 159.9 112.3 121 85.7 160.5
1
2 45.6 27.2 23.1 15.3 51.1
3
4
5
print(d['Misc'])
t_head r_head Q219 Q418
0 H1 Revenue
1 H2 Revenue
2 H3 Income
3 H4 Income
4 H5 Cash 1305 2072
5 H6 Expenses 1239 1117
You can retreive the column names of your created dataframes and select by the columns that are not in it:
other_columns = [x for x in df.columns if x not in (list(df1.columns) + list(df2.columns) + list(df3.columns))]
other_df = df[other_columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With