How do you remove values not in a cluster using a pandas data frame?

Question

If I have a pandas data frame like this made up of 0 and 1s:

 1 1 1 0 0 0 0 1 0
 1 1 1 1 1 0 0 0 0
 1 1 1 0 0 0 0 1 0 
 1 0 0 0 0 1 0 0 0

How do I filter out outliers such that I get something like this:

 1 1 1 0 0 0 0 0 0 
 1 1 1 1 1 0 0 0 0
 1 1 1 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0

Such that I remove the outliers.

Willem Van Onsem · Accepted Answer

We can do this with a cummulative product over the second axis with pandas.cumprod [pandas-doc]:

>>> df.cumprod(axis=1)
   0  1  2  3  4  5  6  7  8
0  1  1  1  0  0  0  0  0  0
1  1  1  1  1  1  0  0  0  0
2  1  1  1  0  0  0  0  0  0
3  1  0  0  0  0  0  0  0  0

The same result can here be obtained with pandas.cummin [pandas-doc]:

>>> df.cummin(axis=1)
   0  1  2  3  4  5  6  7  8
0  1  1  1  0  0  0  0  0  0
1  1  1  1  1  1  0  0  0  0
2  1  1  1  0  0  0  0  0  0
3  1  0  0  0  0  0  0  0  0

How do you remove values not in a cluster using a pandas data frame?

Tags:

python

pandas

python-2.7

Zmann3000

1 Answers

Willem Van Onsem

Recent Activity

Donate For Us

How do you remove values not in a cluster using a pandas data frame?

Tags:

python

pandas

python-2.7

Zmann3000

1 Answers

Willem Van Onsem

Related questions

Recent Activity

Donate For Us