Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - melt, stack, reshape or MultiIndex dataframe columns based on multiple attributes from the column name

I have a Pandas dataframe with data in a very wide form... for example:

ID  Equipment   Function    Task    exprt_cond1_time    exprt_cond2_time    exprt_cond1_freq    exprt_cond2_freq    novce_cond1_time    novce_cond2_time    novce_cond1_freq    novce_cond2_freq
0   eq_type_1   Fxn_a       task_1  12                  24                  0.031               0.055               15                  31                  0.042               0.059
1   eq_type_1   Fxn_a       task_2  10                  22                  0.028               0.052               12                  29                  0.039               0.055
2   eq_type_1   Fxn_b       task_3  13                  25                  0.033               0.057               18                  34                  0.047               0.062
3   eq_type_1   Fxn_b       task_4  9                   19                  0.027               0.051               10                  28                  0.038               0.054
4   eq_type_2   Fxn_a       task_1  14                  27                  0.036               0.056               16                  32                  0.043               0.061
5   eq_type_2   Fxn_a       task_2  11                  26                  0.030               0.054               14                  30                  0.041               0.058

but I am wanting to convert it to a more tidy long format using the text in the column label to make new columns...e.g., data from the first and last rows from the above might look something more like this:

ID  Equipment   Function    Task    Experience  Condition   Time    Freq
0   eq_type_1   Fxn_a       task_1  expert      cond1       12      0.031
1   eq_type_1   Fxn_a       task_1  expert      cond2       24      0.055
2   eq_type_1   Fxn_a       task_1  novice      cond1       15      0.042
3   eq_type_1   Fxn_a       task_1  novice      cond2       31      0.059
...
16  eq_type_2   Fxn_a       task_2  expert      cond1       11      0.030
17  eq_type_2   Fxn_a       task_2  expert      cond2       26      0.054
18  eq_type_2   Fxn_a       task_2  novice      cond1       14      0.041
19  eq_type_2   Fxn_a       task_2  novice      cond2       30      0.058

I can't figure out the right combination of melt / stack / reshape / MultiIndex or other translation functions to make this happen efficiently, or without my code becoming ugly, unwieldy, & nearly unreadable. This question and this question are close and help me some, but they only seem to convert based on a single attribute in the label. Would love any help or tips from the SO community!

like image 680
CJH Avatar asked Dec 14 '22 20:12

CJH


1 Answers

Let's try pd.wide_to_long twice with some column renaming make it all possible:

rename_d = {'exprt_cond1_time':'Time_exprt_cond1',
        'exprt_cond2_time':'Time_exprt_cond2',
        'exprt_cond1_freq':'Freq_exprt_cond1',
        'exprt_cond2_freq':'Freq_exprt_cond2',
        'novce_cond1_time':'Time_novce_cond1',
        'novce_cond2_time':'Time_novce_cond2',
        'novce_cond1_freq':'Freq_novce_cond1',
        'novce_cond2_freq':'Freq_novce_cond2'}

f = df.rename(columns=rename_d)

df1 = pd.wide_to_long(df, ['Time_exprt','Freq_exprt','Time_novce','Freq_novce'],i=['Equipment','Function','Task'],j='Condition',sep='_',suffix='.')

df1 = df1.reset_index()

df_out = pd.wide_to_long(df1,['Time','Freq'],i=['Equipment','Function','Task','Condition'],j='Experience',sep='_',suffix='').reset_index().drop('ID',axis=1)

Output:

    Equipment Function    Task Condition Experience  Time   Freq
0   eq_type_1    Fxn_a  task_1     cond1    exprt    12  0.031
1   eq_type_1    Fxn_a  task_1     cond1    novce    15  0.042
2   eq_type_1    Fxn_a  task_1     cond2    exprt    24  0.055
3   eq_type_1    Fxn_a  task_1     cond2    novce    31  0.059
4   eq_type_1    Fxn_a  task_2     cond1    exprt    10  0.028
5   eq_type_1    Fxn_a  task_2     cond1    novce    12  0.039
6   eq_type_1    Fxn_a  task_2     cond2    exprt    22  0.052
7   eq_type_1    Fxn_a  task_2     cond2    novce    29  0.055
8   eq_type_1    Fxn_b  task_3     cond1    exprt    13  0.033
9   eq_type_1    Fxn_b  task_3     cond1    novce    18  0.047
10  eq_type_1    Fxn_b  task_3     cond2    exprt    25  0.057
11  eq_type_1    Fxn_b  task_3     cond2    novce    34  0.062
12  eq_type_1    Fxn_b  task_4     cond1    exprt     9  0.027
13  eq_type_1    Fxn_b  task_4     cond1    novce    10  0.038
14  eq_type_1    Fxn_b  task_4     cond2    exprt    19  0.051
15  eq_type_1    Fxn_b  task_4     cond2    novce    28  0.054
16  eq_type_2    Fxn_a  task_1     cond1    exprt    14  0.036
17  eq_type_2    Fxn_a  task_1     cond1    novce    16  0.043
18  eq_type_2    Fxn_a  task_1     cond2    exprt    27  0.056
19  eq_type_2    Fxn_a  task_1     cond2    novce    32  0.061
20  eq_type_2    Fxn_a  task_2     cond1    exprt    11  0.030
21  eq_type_2    Fxn_a  task_2     cond1    novce    14  0.041
22  eq_type_2    Fxn_a  task_2     cond2    exprt    26  0.054
23  eq_type_2    Fxn_a  task_2     cond2    novce    30  0.058

pd.wide_to_long handles "simultaneous melts" in Pandas. First, we need to rename those columns to make the stubnames in pd.wide_to_long work.

like image 56
Scott Boston Avatar answered Feb 09 '23 01:02

Scott Boston