Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Column Names Sorted by their Values in a DataFrame

I have a huge dataframe for which I would like to create a dictionary. The keys of the dictionary will be the indices of the row, and the values will be lists of column names of the dataframe sorted by the values in that row (descending order). Consider an example below:

df=      23    45    12     3     6
    45   0.2   1     0.12   0.5   0.1
    12   0.5   0.2   1      0.3   0.9
    23   0.1   0.9   0.3    1     0.5

I would like to create a dictionary in the following form:

dict={ '45':['45','3','23','12','6'], 
       '12':['12','6','23','3','45'], 
       '23':['3','45','6','23']} 

where the values are column names sorted by their values in that row. I tried the following:

for idx,row in df.iteritems():
    l = row.values.tolist()
    l.sort(reverse=True)
    print idx,l 

but this gives me the values and not the column names sorted in descending order. Any help on how I can produce the desired result will be appreciated. Thanks.

like image 458
BajajG Avatar asked Oct 19 '22 16:10

BajajG


1 Answers

Well, this seems to work:

import numpy as np

df = pd.DataFrame({'A': [1, 3, 10, 50], 'B': [2, -8, 3, 7], 'C': [1, 10, -20, 1]})

>>> dict([(r[0], list(df.columns[np.argsort(list(r)[1: ])])) \
    for r in list(df.to_records())])
{0: ['A', 'C', 'B'],
 1: ['B', 'A', 'C'],
 2: ['C', 'B', 'A'],
 3: ['C', 'B', 'A']}

Explanation:

  • list(df.to_records()) is a list of rows as tuples.
  • r[0] is the first element in the tuple.
  • list(r)[1: ] is the rest of the tuple.
  • np.argsort returns the indices of an array according to the sorted value order.
  • dict(list_of_pairs) creates a dictionary from an array of pairs.
like image 123
Ami Tavory Avatar answered Oct 21 '22 06:10

Ami Tavory