Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Multiindex from pattern in column names

I'm trying to code a script that will take a DataFrame with an arbitrary number of experimental conditions (e.g., 3 different concentrations of a drug) and an arbitrary number of replicates of each condition (i.e., trials 1-3) that looks like this:

      100_uM_Drug_Trial_1  100_uM_Drug_Trial_2  10_uM_Drug_Trial_1  \
0             459.924747          635.685284         518.163653   
1             459.458934          636.249568         518.445279   
2             460.006374          636.435523         518.743388   
3             460.002453          636.794022         518.895792   
4             460.598404          636.103206         518.836557   
5             460.309564          637.187444         518.976234   
6             460.609499          636.335023         519.005662   
7             460.843505          637.123839         519.041012   
8             460.969187          637.047453         518.880728   
9             460.832477          637.231533         519.108122   
10            461.255201          638.176752         518.979086   
11            461.310764          636.924448         518.979923   
12            461.507783          637.824450         519.117064   
13            461.116555          637.145600         519.106675   
14            461.891845          638.136241         519.531348   
15            461.746859          637.819223         519.161308   
16            461.840650          637.977134         519.203945   
17            462.028374          638.474671         519.184845   
18            461.726244          638.039615         519.225926   
19            462.128634          638.624309         519.177030   
20            461.242868          637.636891         519.460114   
21            462.201164          638.493620         519.469176   
22            464.078771          637.749872         519.505141   
23            464.605662          639.119425         519.654590   
24            464.352002          638.789306         519.947157   
25            464.485028          638.656634         519.822459   
26            464.506035          639.428889         519.906759   
27            464.834154          638.481042         520.143631   
28            464.886412          639.267176         520.218972   
29            465.414446          638.661687         520.384017 

...and multiindex it by both condition and trial so it looks like this:

Condition     100_uM_Drug                            10_uM_Drug
Trial         1                   2                  1
0             459.924747          635.685284         518.163653   
1             459.458934          636.249568         518.445279   
2             460.006374          636.435523         518.743388   
3             460.002453          636.794022         518.895792   
4             460.598404          636.103206         518.836557   
5             460.309564          637.187444         518.976234   
6             460.609499          636.335023         519.005662   
7             460.843505          637.123839         519.041012   
8             460.969187          637.047453         518.880728   
9             460.832477          637.231533         519.108122   
10            461.255201          638.176752         518.979086   
11            461.310764          636.924448         518.979923   
12            461.507783          637.824450         519.117064   
13            461.116555          637.145600         519.106675   
14            461.891845          638.136241         519.531348   
15            461.746859          637.819223         519.161308   
16            461.840650          637.977134         519.203945   
17            462.028374          638.474671         519.184845   
18            461.726244          638.039615         519.225926   
19            462.128634          638.624309         519.177030   
20            461.242868          637.636891         519.460114   
21            462.201164          638.493620         519.469176   
22            464.078771          637.749872         519.505141   
23            464.605662          639.119425         519.654590   
24            464.352002          638.789306         519.947157   
25            464.485028          638.656634         519.822459   
26            464.506035          639.428889         519.906759   
27            464.834154          638.481042         520.143631   
28            464.886412          639.267176         520.218972   
29            465.414446          638.661687         520.384017 

I've tried a few approaches including filtering column names by a regex, but I haven't gotten anything to work. Is there a quick and easy way to do this that I missed?

Thx

like image 918
rchurt Avatar asked Mar 31 '26 08:03

rchurt


1 Answers

You could use MultiIndex.from_tuples() while splitting the column names (see docs):

df.columns = pd.MultiIndex.from_tuples([('_'.join(col.split('_')[:3]), col.split('_')[-1]) for col in df.columns], names=['Drug', 'Trial'])

produces:

Drug  100_uM_Drug              10_uM_Drug
Trial           1           2           1
0               0  459.924747  635.685284
1               1  459.458934  636.249568
2               2  460.006374  636.435523
3               3  460.002453  636.794022
like image 73
Stefan Avatar answered Apr 02 '26 22:04

Stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!