I have a dataframe with sorted columns, something like this:
df = pd.DataFrame({q: np.sort(np.random.randn(10).round(2)) for q in ['blue', 'green', 'red']})
blue green red
0 -2.15 -0.76 -2.62
1 -0.88 -0.62 -1.65
2 -0.77 -0.55 -1.51
3 -0.73 -0.17 -1.14
4 -0.06 -0.16 -0.75
5 -0.03 0.05 -0.08
6 0.06 0.38 0.37
7 0.41 0.76 1.04
8 0.56 0.89 1.16
9 0.97 2.94 1.79
What I want to know is how many of the n smallest elements in the whole frame are in each column. This is the only thing I came up with:
is_small = df.isin(np.partition(df.values.flatten(), n)[:n])
with n=10 it looks like this:
blue green red
0 True True True
1 True False True
2 True False True
3 True False True
4 False False True
5 False False False
6 False False False
7 False False False
8 False False False
9 False False False
Then by applying np.sum I get the number corresponding to each column.
I'm dissatisfied with this solution because it in no way utilizes the sortedness of the original data. All the data gets partitioned and all the data is then checked for whether it's in the partition. It seems wasteful, and I can't seem to figure out a better way.
Think you can compare the largest of n-smallest values against the partitioned one and then use idxmin to leverage the sorted nature -
# Find largest of n smallest numbers
N = (np.partition(df.values.flatten(), n)[:n]).max()
out = (df<=N).idxmin(axis=0)
Sample run -
In [152]: np.random.seed(0)
In [153]: df = pd.DataFrame({q: np.sort(np.random.randn(10).round(2)) \
for q in ['blue', 'green', 'red']})
In [154]: df
Out[154]:
blue green red
0 -0.98 -0.85 -2.55
1 -0.15 -0.21 -1.45
2 -0.10 0.12 -0.74
3 0.40 0.14 -0.19
4 0.41 0.31 0.05
5 0.95 0.33 0.65
6 0.98 0.44 0.86
7 1.76 0.76 1.47
8 1.87 1.45 1.53
9 2.24 1.49 2.27
In [198]: n = 5
In [199]: N = (np.partition(df.values.flatten(), n)[:n]).max()
In [200]: (df<=N).idxmin(axis=0)
Out[200]:
blue 1
green 1
red 3
dtype: int64
Lets say, you are looking at 10 smallest, you can stack and find value_count for the 10 smallest
df.stack().nsmallest(10).index.get_level_values(1).value_counts()
You get
red 5
blue 4
green 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With