Apologies for the vague question name, but I'm not really sure how to call this operation.
I have the following data frame:
import pandas as pd
df = pd.DataFrame({
'A': [1, 3, 2, 1, 2],
'B': [2, 1, 3, 2, 3],
'C': [3, 2, 1, 3, 1],
})
print(df)
# A B C
# 0 1 2 3
# 1 3 1 2
# 2 2 3 1
# 3 1 2 3
# 4 2 3 1
This data represents a "ranking" of each of the options, A
, B
and C
for each row. So, for example, in row 2
, C
was the best, then A
, then B
. I would like to construct the "inverted" data frame, where, for each row, I have three columns for the 1
, 2
and 3
position of the ranking, with A
, B
and C
being now the data. So, for the example above, the result would be:
out = pd.DataFrame({
1: ['A', 'B', 'C', 'A', 'C'],
2: ['B', 'C', 'A', 'B', 'A'],
3: ['C', 'A', 'B', 'C', 'B'],
})
print(out)
# 1 2 3
# 0 A B C
# 1 B C A
# 2 C A B
# 3 A B C
# 4 C A B
Ideally, each row in df
should have the three distinct values 1
, 2
and 3
, but there may be cases with repeated values (values out that range don't need to be considered). If possible at all, I would like to resolve this by "concatenating" the names of the options in the same position, and having empty strings or NaN in missing positions. For example, with this input:
df_bad = pd.DataFrame({'A': [1], 'B': [2], 'C': [2]})
print(df_bad)
# A B C
# 0 1 2 2
I would ideally want to get this output:
out_bad = pd.DataFrame({1: ['A'], 2: ['BC'], 3: ['']})
print(out_bad)
# 1 2 3
# 0 A BC
Alternatively, I could settle for just getting one of the values instead of the concatenation.
I have been looking through melt
, pivot
, pivot_table
and other functions but I can't figure out the way to get the result I want.
You can use argsort:
pd.DataFrame(df.columns.values[np.argsort(df.values)])
0 1 2
0 A B C
1 B C A
2 C A B
3 A B C
4 C A B
here is one way stack
df.stack().reset_index(level=1).set_index(0,append=True)['level_1'].unstack()
Out[89]:
0 1 2 3
0 A B C
1 B C A
2 C A B
3 A B C
4 C A B
Another way:
df = pd.DataFrame({
'A': [1, 3, 2, 1, 2],
'B': [2, 1, 3, 2, 3],
'C': [3, 2, 1, 2, 1],
})
(df.stack()
.reset_index()
.groupby(['level_0',0])
.level_1.apply(''.join)
.unstack()
)
Output:
0 1 2 3
level_0
0 A B C
1 B C A
2 C A B
3 A BC NaN
4 C A B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With