Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove values not in a cluster using a pandas data frame?

If I have a pandas data frame like this made up of 0 and 1s:

 1 1 1 0 0 0 0 1 0
 1 1 1 1 1 0 0 0 0
 1 1 1 0 0 0 0 1 0 
 1 0 0 0 0 1 0 0 0 

How do I filter out outliers such that I get something like this:

 1 1 1 0 0 0 0 0 0 
 1 1 1 1 1 0 0 0 0
 1 1 1 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0

Such that I remove the outliers.

like image 824
Zmann3000 Avatar asked Feb 03 '23 21:02

Zmann3000


1 Answers

We can do this with a cummulative product over the second axis with pandas.cumprod [pandas-doc]:

>>> df.cumprod(axis=1)
   0  1  2  3  4  5  6  7  8
0  1  1  1  0  0  0  0  0  0
1  1  1  1  1  1  0  0  0  0
2  1  1  1  0  0  0  0  0  0
3  1  0  0  0  0  0  0  0  0

The same result can here be obtained with pandas.cummin [pandas-doc]:

>>> df.cummin(axis=1)
   0  1  2  3  4  5  6  7  8
0  1  1  1  0  0  0  0  0  0
1  1  1  1  1  1  0  0  0  0
2  1  1  1  0  0  0  0  0  0
3  1  0  0  0  0  0  0  0  0
like image 181
Willem Van Onsem Avatar answered Feb 06 '23 14:02

Willem Van Onsem