Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort each row absolute value independent of columns along with column names

I have a data frame of similar format:

df = pd.DataFrame({
 'p1': [0, 0, 1, 1, -2],
 'p2': [9, 2, 3, -5, 3],
 'p3': [1, 3, 10, 3, 7],
 'p4': [4, 4, 7, 1, 10]})

    p1  p2  p3  p4
0   0   9   1   4
1   0   2   3   4
2   1   3   10  7
3   1   -5  3   1
4   -2  3   7   10

Expected output:

top1    top2
p2:9    p4:4
p4:4    p3:3
p3:10   p4:7
p2:-5   p3:3
p4:10   p3:7

With a lot of research, I was able to sort and obtain the indices of the sorted array. I was also able to replace the indices with columns. But I am unable to concatenate them with row values.

nlargest = 3
order = np.argsort(-df.abs().values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order], 
                      columns=['top{}'.format(i) for i in range(1, nlargest+1)])

  top1 top2 top3
0   p2   p4   p3
1   p4   p3   p2
2   p3   p4   p2
3   p2   p3   p1
4   p4   p3   p2

Using the above method I tried to sort the rows in a different DataFrame and later thought of concatenating them. But I couldn't find the right way to do that. I know this is not an optimal way though.

result2 = pd.DataFrame(np.sort(df.values, axis=0), index=df.index, columns=df.columns)
result2 = result2.iloc[:, 0:nlargest]
result2.columns = columns=['top{}'.format(i) for i in range(1, nlargest+1)]

   top1  top2  top3
0    -2    -5     1
1     0     2     3
2     0     3     3
3     1     3     7
4     1     9    10

Please help me correct the sorting and the shortest way to get the expected format.

like image 440
supreeth2812 Avatar asked Mar 03 '23 08:03

supreeth2812


2 Answers

Use for best performance only numpy solution:

nlargest = 3
arr = df.to_numpy()
order = np.argsort(-np.abs(arr), axis=1)[:, :nlargest]
print (order)
[[1 3 2]
 [3 2 1]
 [2 3 1]
 [1 2 0]
 [3 2 1]]

Idea is change order of original data in numpy array arr by order array like this solution:

a = arr[np.arange(arr.shape[0])[:, None], order]
print (a)
[[ 9  4  1]
 [ 4  3  2]
 [10  7  3]
 [-5  3  1]
 [10  7  3]]

So you can add values converted to strings:

result = pd.DataFrame(df.columns[order] + ':' + a.astype(str), 
                      columns=['top{}'.format(i) for i in range(1, nlargest+1)])

print (result)
    top1  top2  top3
0   p2:9  p4:4  p3:1
1   p4:4  p3:3  p2:2
2  p3:10  p4:7  p2:3
3  p2:-5  p3:3  p1:1
4  p4:10  p3:7  p2:3
like image 174
jezrael Avatar answered May 03 '23 02:05

jezrael


Use, DataFrame.transform along with DataFrame.lookup:

result = result.transform(lambda s: s + ':' + df.lookup(s.index, s).astype(str))

# print(result)
    top1  top2  top3
0   p2:9  p4:4  p3:1
1   p4:4  p3:3  p2:2
2  p3:10  p4:7  p2:3
3  p2:-5  p3:3  p1:1
4  p4:10  p3:7  p2:3
like image 40
Shubham Sharma Avatar answered May 03 '23 01:05

Shubham Sharma