I have a data frame of similar format:
df = pd.DataFrame({
'p1': [0, 0, 1, 1, -2],
'p2': [9, 2, 3, -5, 3],
'p3': [1, 3, 10, 3, 7],
'p4': [4, 4, 7, 1, 10]})
p1 p2 p3 p4
0 0 9 1 4
1 0 2 3 4
2 1 3 10 7
3 1 -5 3 1
4 -2 3 7 10
Expected output:
top1 top2
p2:9 p4:4
p4:4 p3:3
p3:10 p4:7
p2:-5 p3:3
p4:10 p3:7
With a lot of research, I was able to sort and obtain the indices of the sorted array. I was also able to replace the indices with columns. But I am unable to concatenate them with row values.
nlargest = 3
order = np.argsort(-df.abs().values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order],
columns=['top{}'.format(i) for i in range(1, nlargest+1)])
top1 top2 top3
0 p2 p4 p3
1 p4 p3 p2
2 p3 p4 p2
3 p2 p3 p1
4 p4 p3 p2
Using the above method I tried to sort the rows in a different DataFrame and later thought of concatenating them. But I couldn't find the right way to do that. I know this is not an optimal way though.
result2 = pd.DataFrame(np.sort(df.values, axis=0), index=df.index, columns=df.columns)
result2 = result2.iloc[:, 0:nlargest]
result2.columns = columns=['top{}'.format(i) for i in range(1, nlargest+1)]
top1 top2 top3
0 -2 -5 1
1 0 2 3
2 0 3 3
3 1 3 7
4 1 9 10
Please help me correct the sorting and the shortest way to get the expected format.
Use for best performance only numpy solution:
nlargest = 3
arr = df.to_numpy()
order = np.argsort(-np.abs(arr), axis=1)[:, :nlargest]
print (order)
[[1 3 2]
[3 2 1]
[2 3 1]
[1 2 0]
[3 2 1]]
Idea is change order of original data in numpy array arr
by order
array like this solution:
a = arr[np.arange(arr.shape[0])[:, None], order]
print (a)
[[ 9 4 1]
[ 4 3 2]
[10 7 3]
[-5 3 1]
[10 7 3]]
So you can add values converted to strings:
result = pd.DataFrame(df.columns[order] + ':' + a.astype(str),
columns=['top{}'.format(i) for i in range(1, nlargest+1)])
print (result)
top1 top2 top3
0 p2:9 p4:4 p3:1
1 p4:4 p3:3 p2:2
2 p3:10 p4:7 p2:3
3 p2:-5 p3:3 p1:1
4 p4:10 p3:7 p2:3
Use, DataFrame.transform
along with DataFrame.lookup
:
result = result.transform(lambda s: s + ':' + df.lookup(s.index, s).astype(str))
# print(result)
top1 top2 top3
0 p2:9 p4:4 p3:1
1 p4:4 p3:3 p2:2
2 p3:10 p4:7 p2:3
3 p2:-5 p3:3 p1:1
4 p4:10 p3:7 p2:3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With