Let's say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values.
ID     Parameter     Value
0      'A'           4.3
1      'B'           3.1
2      'C'           8.9
3      'A'           2.1
4      'A'           3.9
.      .             .
.      .             .
.      .             .
100    'B'           3.8
How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters 'A' and 'B' appear more than 5 times) to get a dataframe like below.
ID     Parameter     Value
0      'A'           4.3
1      'B'           3.1
3      'A'           2.1
.      .             .
.      .             .
.      .             .
100    'B'           3.8
                You can use value_counts + isin -
v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]
For example, where K = 2 (get all items which have more than 2 readings) -
df
   ID Parameter  Value
0   0         A    4.3
1   1         B    3.1
2   2         C    8.9
3   3         A    2.1
4   4         A    3.9
5   5         B    4.5
v = df.Parameter.value_counts()
v
A    3
B    2
C    1
Name: Parameter, dtype: int64
df[df.Parameter.isin(v.index[v.gt(2)])]
   ID Parameter  Value
0   0         A    4.3
3   3         A    2.1
4   4         A    3.9
                        Use transform + size with boolean indexing:
df[df.groupby('Parameter')['Parameter'].transform('size') > 5]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With