How to split a dataframe to multiple dataframes bases on column names

Question

My dataframe is as below,

 _dict = {'t_head': ['H1', 'H2', 'H3', 'H4', 'H5','H6'], 
            'r_head': ['Revenue', 'Revenue', 'Income', 'Income', 'Cash', 'Expenses'], 
            '3ME__ Q219': [159.9, '', 45.6, '', '', ''], 
            '3ME__ Q218': [112.3, '', 27.2, '', '', ''], 
            '3ME__ Q119': [121.0, '', 23.1, '', '', ''], 
            '3ME__ Q18': [85.7, '', 15.3, '', '', ''], 
            '3ME__ Q418': [160.5, '', 51.1, '', '', ''], 
            '9ME__ Q417': [102.6, '', 24.2, '', '', ''], 
            '9ME__ Q318': [118.8, '', 30.2, '', '', ''], 
            '9ME__ Q317': [79.4, '', 15.3, '', '', ''], 
            '6ME__ Q219': ['', 280.9, '', 68.7, '', ''], 
            '6ME__ Q218': ['', 198.0, '', 42.6, '', ''], 
            'Q219': ['', '', '', '', 1305, 1239], 
            'Q418': ['', '', '', '', 2072, 1117]
            }
df = pd.DataFrame.from_dict(_dict)
print(df)  

  t_head    r_head 3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418 9ME__ Q417 9ME__ Q318 9ME__ Q317 6ME__ Q219 6ME__ Q218  Q219  Q418
0     H1   Revenue      159.9      112.3        121      85.7      160.5      102.6      118.8       79.4                                  
1     H2   Revenue                                                                                             280.9        198            
2     H3    Income       45.6       27.2       23.1      15.3       51.1       24.2       30.2       15.3                                  
3     H4    Income                                                                                              68.7       42.6            
4     H5      Cash                                                                                                               1305  2072
5     H6  Expenses                                                                                                               1239  1117

I want to split this dataframe into multiple dtaframes base on column heading. Here column headings can start with 3ME__,6ME__,9ME__ (all/any/none can be present) or other values. i want to all columns starting with 3ME__ to be in one dataframe,6ME__ to another...etc. and the all of the rest to be in a fourth dataframe.
What i had tried is as below,

df1 = df.filter(regex='3ME__')
if not df1.empty:
    df1 = df1[df1.iloc[:,0].astype(bool)]
df2 = df.filter(regex='6ME__')
if not df2.empty:
    df2 = df2[df2.iloc[:,0].astype(bool)]
df3 = df.filter(regex='9ME__')
if not df3.empty:
    df3 = df3[df3.iloc[:,0].astype(bool)]

Here i am able to filter out column names starting with3ME__,6ME__ & 9ME__ to different dataframes, but not able to get the rest of column headings to one dataframe.

1.) How to get the rest of column headings to one dataframe?
2.) Is there any simpler method to split into dictionary with a key and dataframes as values?

Please Help.

anky · Accepted Answer

I would do the below:

m=df.set_index(['t_head','r_head']) #set the 2 columns as index

Then split columns and group by on axis 1 and make a dict with each group

d={f'df_{i}': g for i, g in m.groupby(m.columns.str.split('_').str[0],axis=1)}

Then call each key to access this dictionary:

print(d['df_3ME'])

Based on further discussion we do the same operation but with a condition:

cond=df.columns.str.contains('__') #check if cols have double _
d={f'df_{i}':g for i, g in 
   df.loc[:,cond].groupby(df.loc[:,cond].columns.str.split('__').str[0],axis=1)}
d.update({'Misc':df.loc[:,~cond]}) #update the dict with all that doesnt meet condition
print(d['df_3ME'])

  3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418
0      159.9      112.3        121      85.7      160.5
1                                                      
2       45.6       27.2       23.1      15.3       51.1
3                                                      
4                                                      
5

print(d['Misc'])

  t_head    r_head  Q219  Q418
0     H1   Revenue            
1     H2   Revenue            
2     H3    Income            
3     H4    Income            
4     H5      Cash  1305  2072
5     H6  Expenses  1239  1117

SmileyProd · Answer

You can retreive the column names of your created dataframes and select by the columns that are not in it:

other_columns = [x for x in df.columns if x not in (list(df1.columns) + list(df2.columns) + list(df3.columns))]

other_df = df[other_columns]

How to split a dataframe to multiple dataframes bases on column names

Tags:

python

dictionary

python-3.x

pandas

dataframe

Shijith

2 Answers

anky

SmileyProd

Recent Activity

Donate For Us

How to split a dataframe to multiple dataframes bases on column names

Tags:

python

dictionary

python-3.x

pandas

dataframe

Shijith

2 Answers

anky

SmileyProd

Related questions

Recent Activity

Donate For Us