I have a Pandas dataframe with data in a very wide form... for example:
ID Equipment Function Task exprt_cond1_time exprt_cond2_time exprt_cond1_freq exprt_cond2_freq novce_cond1_time novce_cond2_time novce_cond1_freq novce_cond2_freq
0 eq_type_1 Fxn_a task_1 12 24 0.031 0.055 15 31 0.042 0.059
1 eq_type_1 Fxn_a task_2 10 22 0.028 0.052 12 29 0.039 0.055
2 eq_type_1 Fxn_b task_3 13 25 0.033 0.057 18 34 0.047 0.062
3 eq_type_1 Fxn_b task_4 9 19 0.027 0.051 10 28 0.038 0.054
4 eq_type_2 Fxn_a task_1 14 27 0.036 0.056 16 32 0.043 0.061
5 eq_type_2 Fxn_a task_2 11 26 0.030 0.054 14 30 0.041 0.058
but I am wanting to convert it to a more tidy long format using the text in the column label to make new columns...e.g., data from the first and last rows from the above might look something more like this:
ID Equipment Function Task Experience Condition Time Freq
0 eq_type_1 Fxn_a task_1 expert cond1 12 0.031
1 eq_type_1 Fxn_a task_1 expert cond2 24 0.055
2 eq_type_1 Fxn_a task_1 novice cond1 15 0.042
3 eq_type_1 Fxn_a task_1 novice cond2 31 0.059
...
16 eq_type_2 Fxn_a task_2 expert cond1 11 0.030
17 eq_type_2 Fxn_a task_2 expert cond2 26 0.054
18 eq_type_2 Fxn_a task_2 novice cond1 14 0.041
19 eq_type_2 Fxn_a task_2 novice cond2 30 0.058
I can't figure out the right combination of melt / stack / reshape / MultiIndex or other translation functions to make this happen efficiently, or without my code becoming ugly, unwieldy, & nearly unreadable. This question and this question are close and help me some, but they only seem to convert based on a single attribute in the label. Would love any help or tips from the SO community!
Let's try pd.wide_to_long
twice with some column renaming make it all possible:
rename_d = {'exprt_cond1_time':'Time_exprt_cond1',
'exprt_cond2_time':'Time_exprt_cond2',
'exprt_cond1_freq':'Freq_exprt_cond1',
'exprt_cond2_freq':'Freq_exprt_cond2',
'novce_cond1_time':'Time_novce_cond1',
'novce_cond2_time':'Time_novce_cond2',
'novce_cond1_freq':'Freq_novce_cond1',
'novce_cond2_freq':'Freq_novce_cond2'}
f = df.rename(columns=rename_d)
df1 = pd.wide_to_long(df, ['Time_exprt','Freq_exprt','Time_novce','Freq_novce'],i=['Equipment','Function','Task'],j='Condition',sep='_',suffix='.')
df1 = df1.reset_index()
df_out = pd.wide_to_long(df1,['Time','Freq'],i=['Equipment','Function','Task','Condition'],j='Experience',sep='_',suffix='').reset_index().drop('ID',axis=1)
Output:
Equipment Function Task Condition Experience Time Freq
0 eq_type_1 Fxn_a task_1 cond1 exprt 12 0.031
1 eq_type_1 Fxn_a task_1 cond1 novce 15 0.042
2 eq_type_1 Fxn_a task_1 cond2 exprt 24 0.055
3 eq_type_1 Fxn_a task_1 cond2 novce 31 0.059
4 eq_type_1 Fxn_a task_2 cond1 exprt 10 0.028
5 eq_type_1 Fxn_a task_2 cond1 novce 12 0.039
6 eq_type_1 Fxn_a task_2 cond2 exprt 22 0.052
7 eq_type_1 Fxn_a task_2 cond2 novce 29 0.055
8 eq_type_1 Fxn_b task_3 cond1 exprt 13 0.033
9 eq_type_1 Fxn_b task_3 cond1 novce 18 0.047
10 eq_type_1 Fxn_b task_3 cond2 exprt 25 0.057
11 eq_type_1 Fxn_b task_3 cond2 novce 34 0.062
12 eq_type_1 Fxn_b task_4 cond1 exprt 9 0.027
13 eq_type_1 Fxn_b task_4 cond1 novce 10 0.038
14 eq_type_1 Fxn_b task_4 cond2 exprt 19 0.051
15 eq_type_1 Fxn_b task_4 cond2 novce 28 0.054
16 eq_type_2 Fxn_a task_1 cond1 exprt 14 0.036
17 eq_type_2 Fxn_a task_1 cond1 novce 16 0.043
18 eq_type_2 Fxn_a task_1 cond2 exprt 27 0.056
19 eq_type_2 Fxn_a task_1 cond2 novce 32 0.061
20 eq_type_2 Fxn_a task_2 cond1 exprt 11 0.030
21 eq_type_2 Fxn_a task_2 cond1 novce 14 0.041
22 eq_type_2 Fxn_a task_2 cond2 exprt 26 0.054
23 eq_type_2 Fxn_a task_2 cond2 novce 30 0.058
pd.wide_to_long
handles "simultaneous melts" in Pandas. First, we need to rename those columns to make the stubnames in pd.wide_to_long work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With