I have a question about how to filter and select anomalous datasets from a large df. For example, I have a df:
import pandas as pd
import numpy as np
data = {"code": ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'd'],
"number": [7, 5, 2, 4, 6, 9, 6, 2, 8, 2]}
df = pd.DataFrame(data=data)
code number
0 a 7
1 a 5
2 a 2
3 b 4
4 b 6
5 c 9
6 c 6
7 c 2
8 d 8
9 d 2
In this df, most of data follow a rule that in a same 'code' group, a larger number appears in the beginning. For example, in 'a' group, its values in dataframe follows: 7>5>2; in 'c' group, its value follows: 9>6>2, same pattern in 'd' group 8 > 2. But only not in 'b' group as a smaller value 4 arranges before than 6. So I wish to filter the anomalous subset b only and have an output like:
code number
0 b 4
1 b 6
Would anyone have any ideas? Much appreciate for help.
We can do filter
then with diff
df.groupby('code').filter(lambda x : (x.number.diff()>0).any())
code number
3 b 4
4 b 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With