I have a huge dataframe for which I would like to create a dictionary. The keys of the dictionary will be the indices of the row, and the values will be lists of column names of the dataframe sorted by the values in that row (descending order). Consider an example below:
df= 23 45 12 3 6
45 0.2 1 0.12 0.5 0.1
12 0.5 0.2 1 0.3 0.9
23 0.1 0.9 0.3 1 0.5
I would like to create a dictionary in the following form:
dict={ '45':['45','3','23','12','6'],
'12':['12','6','23','3','45'],
'23':['3','45','6','23']}
where the values are column names sorted by their values in that row. I tried the following:
for idx,row in df.iteritems():
l = row.values.tolist()
l.sort(reverse=True)
print idx,l
but this gives me the values and not the column names sorted in descending order. Any help on how I can produce the desired result will be appreciated. Thanks.
Well, this seems to work:
import numpy as np
df = pd.DataFrame({'A': [1, 3, 10, 50], 'B': [2, -8, 3, 7], 'C': [1, 10, -20, 1]})
>>> dict([(r[0], list(df.columns[np.argsort(list(r)[1: ])])) \
for r in list(df.to_records())])
{0: ['A', 'C', 'B'],
1: ['B', 'A', 'C'],
2: ['C', 'B', 'A'],
3: ['C', 'B', 'A']}
Explanation:
list(df.to_records())
is a list of rows as tuples.r[0]
is the first element in the tuple.list(r)[1: ]
is the rest of the tuple.np.argsort
returns the indices of an array according to the sorted value order.dict(
list_of_pairs)
creates a dictionary from an array of pairs. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With