Here is my pandas.DataFrame
:
import pandas as pd
data = pd.DataFrame({
'first': [40, 32, 56, 12, 89],
'second': [13, 45, 76, 19, 45],
'third': [98, 56, 87, 12, 67]
}, index = ['first', 'second', 'third', 'fourth', 'fifth'])
I want to create a new DataFrame
that will contain top 3 values from each column of my data
DataFrame
.
Here is an expected output:
first second third
0 89 76 98
1 56 45 87
2 40 45 67
How can I do that?
Create a function to return the top three values of a series:
def sorted(s, num):
tmp = s.sort_values(ascending=False)[:num] # earlier s.order(..)
tmp.index = range(num)
return tmp
Apply it to your data set:
In [1]: data.apply(lambda x: sorted(x, 3))
Out[1]:
first second third
0 89 76 98
1 56 45 87
2 40 45 67
With numpy you can get array of top-3 values along columns like follows:
>>> import numpy as np
>>> col_ind = np.argsort(data.values, axis=0)[::-1,:]
>>> ind_to_take = col_ind[:3,:] + np.arange(data.shape[1])*data.shape[0]
>>> np.take(data.values.T, ind_to_take)
array([[89, 76, 98],
[56, 45, 87],
[40, 45, 67]], dtype=int64)
You can convert back to DataFrame:
>>> pd.DataFrame(_, columns = data.columns, index=data.index[:3])
first second third
One 89 76 98
Two 56 45 87
Three 40 45 67
The other solutions (at the time of writing this), sort the DataFrame with super-linear complexity per column, but it can actually be done with linear time per column.
first, numpy.partition
partitions the k smallest elements at the k first positions (unsorted otherwise). To get the k largest elements, we can use
import numpy as np
-np.partition(-v, k)[: k]
Combining this with dictionary comprehension, we can use:
>>> pd.DataFrame({c: -np.partition(-data[c], 3)[: 3] for c in data.columns})
first second third
0 89 76 98
1 56 45 87
2 40 45 67
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With