I have the following dataframe: <pre class="prettyprint"><code> True_False cum_val Date 2018-01-02 False NaN 2018-01-03 False 0.006399 2018-01-04 False 0.010427 2018-01-05 False 0.017461 2018-01-08 False 0.019124 2018-01-09 False 0.020426 2018-01-10 False 0.019314 2018-01-11 False 0.026348 2018-01-12 False 0.033098 2018-01-16 False 0.029573 2018-01-17 False 0.038988 2018-01-18 False 0.037372 2018-01-19 False 0.041757 2018-01-22 False 0.049824 2018-01-23 False 0.051998 2018-01-24 False 0.051438 2018-01-25 False 0.052041 2018-01-26 False 0.063882 2018-01-29 False 0.057150 2018-01-30 True -0.010899 2018-01-31 True -0.010410 2018-02-01 True -0.011058 2018-02-02 True -0.032266 2018-02-05 True -0.073246 2018-02-06 True -0.055805 2018-02-07 True -0.060806 2018-02-08 True -0.098343 2018-02-09 True -0.083407 2018-02-12 False 0.013915 2018-02-13 False 0.016528 2018-02-14 False 0.029930 2018-02-15 False 0.041999 2018-02-16 False 0.042373 2018-02-20 False 0.036531 2018-02-21 False 0.031035 2018-03-06 False 0.013671 </code></pre> How can I drop the row second value after <code>False</code> all the the <code>True</code> values till the second <code>True Value</code> till the second <code>False</code>? Such as for example: <pre class="prettyprint"><code> True_False cum_val Date 2020-01-21 False 0.022808 2020-01-22 False 0.023097 2020-01-23 True 0.001141 2020-01-24 True -0.007901 # <- Start drop here since this is the second True 2020-01-27 True -0.023632 2020-01-28 False -0.013578 2020-01-29 False -0.000867 #< - End Drop Here Since this is the second False 2020-01-30 False 0.003134 </code></pre> <hr> Edit 1: I would like to add 1 more condition on the new df: <pre class="prettyprint"><code>2020-01-22 0.000289 False 2020-01-23 0.001141 True 2020-01-27 -0.015731 True # <- Start Drop Here 2020-01-28 0.010054 True 2020-01-29 -0.000867 False 2020-01-30 0.003134 True #<-End drop here 2020-02-03 0.007255 True </code></pre> As you have mentioned in the comment: [True, True, True, False, True] In this case it would still start the drop at the second <code>True</code> value but would stop the drop right after the first <code>False</code> even though the second value has toggled to <code>True</code>. If the next value is still <code>True</code> drop it till the value after <code>False</code>

Let's try using <code>where</code> with <code>ffill</code> and parameter <code>limit=2</code> then boolean filtering: <pre class="prettyprint"><code>df[~(df['True_False'].where(df['True_False']).ffill(limit=2).cumsum() > 1)] </code></pre> Output: <pre class="prettyprint"><code>| | Date | True_False | cum_val | |----|------------|--------------|-----------| | 0 | 2020-01-21 | False | 1 | | 1 | 2020-01-22 | False | 2 | | 2 | 2020-01-23 | True | 3 | | 7 | 2020-01-28 | False | 8 | </code></pre> Details: <ul> <li>First let's convert the False to np.nan using <code>where</code> </li> <li>Next, fill first two np.nan after the last True using <code>ffill(limit=2)</code> </li> <li>Now, let's use <code>cumsum</code> so we can add consecutive True and select those greater than 2 </li> <li>And negate, to keep false records above the first True record and third False record and on.</li> </ul>

You could use <code>Series.Shift</code> and <code>Series.bfill</code>: <pre class="prettyprint"><code>df = df[~df['True_False'].shift().bfill()] </code></pre> <hr> <pre class="prettyprint"><code>print(df) Date True_False cum_val 0 2020-01-21 False 0.022808 1 2020-01-22 False 0.023097 2 2020-01-23 True 0.001141 6 2020-01-29 False -0.000867 7 2020-01-30 False 0.003134 </code></pre>

You can do: <pre class="prettyprint lang-py prettyprint-override"><code>#mark start of the area you want to drop df["dropit"]=np.where(df["True_False"] & df["True_False"].shift(1) & np.logical_not(df["True_False"].shift(2)), "start", None) #mark the end of the drop area df["dropit"]=np.where(np.logical_not(df["True_False"].shift(1)) & df["True_False"].shift(2), "end", df["dropit"]) #indicate gaps between the different drop areas: df.loc[df["dropit"].shift().eq("end")&df["dropit"].ne("start"), "dropit"]="keep" #forward fill df["dropit"]=df["dropit"].ffill() #drop marked drop areas and drop "dropit" column df=df.drop(df.loc[df["dropit"].isin(["start", "end"])].index, axis=0).drop("dropit", axis=1) </code></pre> Outputs: <pre class="prettyprint lang-py prettyprint-override"><code> True_False cum_val Date 2018-01-02 False NaN 2018-01-03 False 0.006399 2018-01-04 False 0.010427 2018-01-05 False 0.017461 2018-01-08 False 0.019124 2018-01-09 False 0.020426 2018-01-10 False 0.019314 2018-01-11 False 0.026348 2018-01-12 False 0.033098 2018-01-16 False 0.029573 2018-01-17 False 0.038988 2018-01-18 False 0.037372 2018-01-19 False 0.041757 2018-01-22 False 0.049824 2018-01-23 False 0.051998 2018-01-24 False 0.051438 2018-01-25 False 0.052041 2018-01-26 False 0.063882 2018-01-29 False 0.057150 2018-01-30 True -0.010899 2018-02-14 False 0.029930 2018-02-15 False 0.041999 2018-02-16 False 0.042373 2018-02-20 False 0.036531 2018-02-21 False 0.031035 2018-03-06 False 0.013671 </code></pre>

How to conditionally drop rows in pandas

Tags:

python

pandas

I have the following dataframe:

        True_False  cum_val
Date        
2018-01-02  False   NaN
2018-01-03  False   0.006399
2018-01-04  False   0.010427
2018-01-05  False   0.017461
2018-01-08  False   0.019124
2018-01-09  False   0.020426
2018-01-10  False   0.019314
2018-01-11  False   0.026348
2018-01-12  False   0.033098
2018-01-16  False   0.029573
2018-01-17  False   0.038988
2018-01-18  False   0.037372
2018-01-19  False   0.041757
2018-01-22  False   0.049824
2018-01-23  False   0.051998
2018-01-24  False   0.051438
2018-01-25  False   0.052041
2018-01-26  False   0.063882
2018-01-29  False   0.057150
2018-01-30  True    -0.010899
2018-01-31  True    -0.010410
2018-02-01  True    -0.011058
2018-02-02  True    -0.032266
2018-02-05  True    -0.073246
2018-02-06  True    -0.055805
2018-02-07  True    -0.060806
2018-02-08  True    -0.098343
2018-02-09  True    -0.083407
2018-02-12  False   0.013915
2018-02-13  False   0.016528
2018-02-14  False   0.029930
2018-02-15  False   0.041999
2018-02-16  False   0.042373
2018-02-20  False   0.036531
2018-02-21  False   0.031035
2018-03-06  False   0.013671

How can I drop the row second value after False all the the True values till the second True Value till the second False?

Such as for example:

    True_False  cum_val
Date        
2020-01-21  False   0.022808
2020-01-22  False   0.023097
2020-01-23  True    0.001141
2020-01-24  True    -0.007901 # <- Start drop here since this is the second True
2020-01-27  True    -0.023632
2020-01-28  False -0.013578
2020-01-29  False -0.000867 #< - End Drop Here Since this is the second False
2020-01-30  False 0.003134

Edit 1:

I would like to add 1 more condition on the new df:

2020-01-22  0.000289    False   
2020-01-23  0.001141    True    
2020-01-27  -0.015731   True    # <- Start Drop Here
2020-01-28  0.010054    True    
2020-01-29  -0.000867   False   
2020-01-30  0.003134    True    #<-End drop here
2020-02-03  0.007255    True

As you have mentioned in the comment: [True, True, True, False, True]

In this case it would still start the drop at the second True value but would stop the drop right after the first False even though the second value has toggled to True. If the next value is still True drop it till the value after False

236

asked Jan 31 '20 05:01

Slartibartfast

4 Answers

Let's try using where with ffill and parameter limit=2 then boolean filtering:

df[~(df['True_False'].where(df['True_False']).ffill(limit=2).cumsum() > 1)]

Output:

|    | Date       | True_False   |   cum_val |
|----|------------|--------------|-----------|
|  0 | 2020-01-21 | False        |         1 |
|  1 | 2020-01-22 | False        |         2 |
|  2 | 2020-01-23 | True         |         3 |
|  7 | 2020-01-28 | False        |         8 |

Details:

First let's convert the False to np.nan using where
Next, fill first two np.nan after the last True using ffill(limit=2)
Now, let's use cumsum so we can add consecutive True and select those greater than 2
And negate, to keep false records above the first True record and third False record and on.

149

answered Oct 18 '22 22:10

Scott Boston

Here's what I tried. The data I created is:

    Date    True_False  cum_val
0   2020-01-21  False   1
1   2020-01-22  False   2
2   2020-01-23  True    3
3   2020-01-24  True    4
4   2020-01-25  True    5
5   2020-01-26  False   6
6   2020-01-27  False   7
7   2020-01-28  False   8

true_count = 0
false_count = 0
drop_continue = False
for index, row in df.iterrows():
    if row['True_False'] is True and drop_continue is False:
        true_count +=1
        if true_count == 2:
            drop_continue = True
            df.drop(index, inplace=True)
            true_count = 0
            continue
    if drop_continue is True:
        if row['True_False'] is True:
            df.drop(index, inplace=True)
        if row['True_False'] is False:
            false_count += 1
            if false_count <2:
                df.drop(index, inplace=True)
            else:
                drop_continue = False
                false_count = 0

Output

    Date    True_False  cum_val
0   2020-01-21  False   1
1   2020-01-22  False   2
2   2020-01-23  True    3
6   2020-01-27  False   7
7   2020-01-28  False   8

answered Oct 18 '22 21:10

Vishakha Lall

You could use Series.Shift and Series.bfill:

df = df[~df['True_False'].shift().bfill()]

print(df)                                                               
         Date  True_False   cum_val
0  2020-01-21       False  0.022808
1  2020-01-22       False  0.023097
2  2020-01-23        True  0.001141
6  2020-01-29       False -0.000867
7  2020-01-30       False  0.003134

answered Oct 18 '22 22:10

dkhara

You can do:

#mark start of the area you want to drop
df["dropit"]=np.where(df["True_False"] & df["True_False"].shift(1) & np.logical_not(df["True_False"].shift(2)), "start", None)

#mark the end of the drop area
df["dropit"]=np.where(np.logical_not(df["True_False"].shift(1)) & df["True_False"].shift(2), "end", df["dropit"])

#indicate gaps between the different drop areas:
df.loc[df["dropit"].shift().eq("end")&df["dropit"].ne("start"), "dropit"]="keep"

#forward fill
df["dropit"]=df["dropit"].ffill()

#drop marked drop areas and drop "dropit" column
df=df.drop(df.loc[df["dropit"].isin(["start", "end"])].index, axis=0).drop("dropit", axis=1)

Outputs:

            True_False   cum_val
Date
2018-01-02       False       NaN
2018-01-03       False  0.006399
2018-01-04       False  0.010427
2018-01-05       False  0.017461
2018-01-08       False  0.019124
2018-01-09       False  0.020426
2018-01-10       False  0.019314
2018-01-11       False  0.026348
2018-01-12       False  0.033098
2018-01-16       False  0.029573
2018-01-17       False  0.038988
2018-01-18       False  0.037372
2018-01-19       False  0.041757
2018-01-22       False  0.049824
2018-01-23       False  0.051998
2018-01-24       False  0.051438
2018-01-25       False  0.052041
2018-01-26       False  0.063882
2018-01-29       False  0.057150
2018-01-30        True -0.010899
2018-02-14       False  0.029930
2018-02-15       False  0.041999
2018-02-16       False  0.042373
2018-02-20       False  0.036531
2018-02-21       False  0.031035
2018-03-06       False  0.013671

answered Oct 18 '22 21:10

Grzegorz Skibinski

Related questions
                            
                                Pandas How to create a new dataframe with a start and end even if on different rows
                            
                                What is the difference between json() method and json.loads()
                            
                                tensorflow transition to gpu version
                            
                                Forward fill missing values by group after condition is met in pandas
                            
                                python-docx: Parse a table to Panda Dataframe
                            
                                Get visual feedback from QValidator
                            
                                How to set a value for a specific threshold in SVC model and generate a confusion matrix?
                            
                                Installing Python 3.8 on windows 7 32bit with SP1
                            
                                Display pandas dataframe with larger font in jupyter notebook
                            
                                Aiohttp logging: how to distinguish log messages of different requests?
                            
                                pandas group by and find first non null value for all columns
                            
                                Bayesian network in Python: both construction and sampling
                            
                                Can the "off" color be set for a Matplotlib dashed line?
                            
                                Python kernel dies on Jupyter Notebook with tensorflow 2
                            
                                how to get a continuous rolling mean in pandas?
                            
                                pandas - Splitting date ranges on specific day boundary
                            
                                Airflow task running tweepy exits with return code -6
                            
                                Overfitting and data leakage in tensorflow/keras neural network
                            
                                Sending messages in the on_ready? Python discord bot
                            
                                pinging ~ 100,000 servers, is multithreading or multiprocessing better?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With