I'm trying to transform a DataFrame with a dynamic number of a_P columns looking like this
a1_P a2_P weight
0 33297.81 17407.93 14733.23
1 58895.18 43013.57 86954.04
into a new DataFrame, looking like this (sorted by P)
P weight
0 17407.93 14733.23
1 33297.81 14733.23
2 43013.57 86954.04
3 58895.18 86954.04
So what I'm trying so far is
names = ["a1", "a2"]
p = pd.DataFrame(columns=["P", "weight"])
for i in range(0, len(names)):
p += df[["{}_P".format(names[i]), "weight"]]
and to sort it afterwards but this does not work because columnnames are not identical I guess.
The pandas.melt function does something like what you want:
pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P')
weight variable P
0 14733.23 a1_P 33297.81
1 86954.04 a1_P 58895.18
2 14733.23 a2_P 17407.93
3 86954.04 a2_P 43013.57
And of course, soring by P is easily done by appending a .sort('P') to the end of the melt statement.
pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P').sort('P')
weight variable P
2 14733.23 a2_P 17407.93
0 14733.23 a1_P 33297.81
3 86954.04 a2_P 43013.57
1 86954.04 a1_P 58895.18
And if you want to be super dynamic, maybe generating the value_vars in this fancy way:
n_values = 2
value_vars = ["a{}_P".format(i+1) for i in range(0, n_values)]
pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P').sort('P')
To get the index to be [0, 1, 2, 3 ...], just use .reset_index(drop=True) either as a chained event, or like this:
df = pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P')
df.sort(inplace=True)
df.reset_index(drop=True, inplace=True)
I personally prefer inplace operations, because they are much much more memory efficient.
A possible solution using Pandas concat (http://pandas.pydata.org/pandas-docs/stable/merging.html):
import pandas as pd
df = pd.DataFrame.from_dict({'a1_P': [123.123, 342.123],
'a2_P': [232.12, 32.23],
'weight': [12312.23, 16232.3]})
cols = [x for x in df.columns if '_P' in x]
new = pd.concat([df[col] for col in cols])
oldidx = new.index
weights = df.loc[new.index, 'weight'].tolist()
new_df = pd.DataFrame.from_dict({'P': new,
'weight': weights})
new_df.sort(columns='P', inplace=True)
new_df.reset_index(drop=True, inplace=True)
print(new_df)
P weight
0 32.230 16232.30
1 123.123 12312.23
2 232.120 12312.23
3 342.123 16232.30
There is room for performance optimizations, but it should faster than a solution with explicit loops.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With