Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas- repeat dataframe columns based on a reference dict

I need to rename and repeat my dataframe columns based on a reference dict. Below I have created a dummy dataframe:

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

Now I have the following example dict:

ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

I need to now replace the column names based on dict values and if a column has more than one value than the column should be repeated. Following is my desired dataframe:

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent
like image 536
Rtut Avatar asked May 31 '26 07:05

Rtut


1 Answers

Option 1
Use pd.concat on a dictionary comprehension

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)

      entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id                                                                                
json       present      present       absent       absent       absent     present
molly      present      present       absent       absent       absent      absent
tina       present      present       absent       absent       absent      absent
jake        absent       absent      present      present      present     present
molly       absent       absent       absent       absent       absent     present

Option 2
Slice the dataframe and rename columns

repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1

      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                                                                                
json      present      present      present       absent       absent       absent
molly      absent      present      present       absent       absent       absent
tina       absent      present      present       absent       absent       absent
jake      present       absent       absent      present      present      present
molly     present       absent       absent       absent       absent       absent
like image 146
piRSquared Avatar answered Jun 03 '26 00:06

piRSquared