Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turning values into columns

Apologies for the vague question name, but I'm not really sure how to call this operation.

I have the following data frame:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 3, 2, 1, 2],
    'B': [2, 1, 3, 2, 3],
    'C': [3, 2, 1, 3, 1],
})
print(df)
#    A  B  C
# 0  1  2  3
# 1  3  1  2
# 2  2  3  1
# 3  1  2  3
# 4  2  3  1

This data represents a "ranking" of each of the options, A, B and C for each row. So, for example, in row 2, C was the best, then A, then B. I would like to construct the "inverted" data frame, where, for each row, I have three columns for the 1, 2 and 3 position of the ranking, with A, B and C being now the data. So, for the example above, the result would be:

out = pd.DataFrame({
    1: ['A', 'B', 'C', 'A', 'C'],
    2: ['B', 'C', 'A', 'B', 'A'],
    3: ['C', 'A', 'B', 'C', 'B'],
})
print(out)
#    1  2  3
# 0  A  B  C
# 1  B  C  A
# 2  C  A  B
# 3  A  B  C
# 4  C  A  B

Ideally, each row in df should have the three distinct values 1, 2 and 3, but there may be cases with repeated values (values out that range don't need to be considered). If possible at all, I would like to resolve this by "concatenating" the names of the options in the same position, and having empty strings or NaN in missing positions. For example, with this input:

df_bad = pd.DataFrame({'A': [1], 'B': [2], 'C': [2]})
print(df_bad)
#    A  B  C
# 0  1  2  2

I would ideally want to get this output:

out_bad = pd.DataFrame({1: ['A'], 2: ['BC'], 3: ['']})
print(out_bad)
#    1   2 3
# 0  A  BC

Alternatively, I could settle for just getting one of the values instead of the concatenation.

I have been looking through melt, pivot, pivot_table and other functions but I can't figure out the way to get the result I want.

like image 566
jdehesa Avatar asked Sep 10 '19 16:09

jdehesa


3 Answers

You can use argsort:

pd.DataFrame(df.columns.values[np.argsort(df.values)])

   0  1  2
0  A  B  C
1  B  C  A
2  C  A  B
3  A  B  C
4  C  A  B
like image 110
anky Avatar answered Nov 16 '22 11:11

anky


here is one way stack

df.stack().reset_index(level=1).set_index(0,append=True)['level_1'].unstack()
Out[89]: 
0  1  2  3
0  A  B  C
1  B  C  A
2  C  A  B
3  A  B  C
4  C  A  B
like image 7
BENY Avatar answered Nov 16 '22 10:11

BENY


Another way:

df = pd.DataFrame({
    'A': [1, 3, 2, 1, 2],
    'B': [2, 1, 3, 2, 3],
    'C': [3, 2, 1, 2, 1],
})

(df.stack()
   .reset_index()
   .groupby(['level_0',0])
   .level_1.apply(''.join)
   .unstack()
)

Output:

0        1   2    3
level_0            
0        A   B    C
1        B   C    A
2        C   A    B
3        A  BC  NaN
4        C   A    B
like image 6
Quang Hoang Avatar answered Nov 16 '22 11:11

Quang Hoang