Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter anomalous and complex datasets

Tags:

I have a question about how to filter and select anomalous datasets from a large df. For example, I have a df:

import pandas as pd
import numpy as np

data = {"code": ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'd'],
"number": [7, 5, 2, 4, 6, 9, 6, 2, 8, 2]}

df = pd.DataFrame(data=data)

  code  number
0    a       7
1    a       5
2    a       2
3    b       4
4    b       6
5    c       9
6    c       6
7    c       2
8    d       8
9    d       2

In this df, most of data follow a rule that in a same 'code' group, a larger number appears in the beginning. For example, in 'a' group, its values in dataframe follows: 7>5>2; in 'c' group, its value follows: 9>6>2, same pattern in 'd' group 8 > 2. But only not in 'b' group as a smaller value 4 arranges before than 6. So I wish to filter the anomalous subset b only and have an output like:

  code  number
0    b       4
1    b       6

Would anyone have any ideas? Much appreciate for help.

like image 658
Alice jinx Avatar asked Jun 21 '20 23:06

Alice jinx


1 Answers

We can do filter then with diff

df.groupby('code').filter(lambda x : (x.number.diff()>0).any())                        
  code  number
3    b       4
4    b       6
like image 134
BENY Avatar answered Oct 20 '22 05:10

BENY