I have searched SO and although there are many QA about conditionally removing rows none of the QA fit my problem.
I have a data.frame
containing longitudinal measurements of variable x
, y
etc... , at various time points time
, in several subjects id
. Some subjects experience an event ev
(denoted as 1
, otherwise 0
at some time
). I would like to reduce the initial data.frame
to:
so that,
testdf<-data.frame(id=c(rep("A",4),rep("B",4),rep("C",4) ),
x=c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5),
y=rev(c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5)),
time=c(1,2,3,4,0.1,0.5,10,20,3,2,1,0.5),
ev=c(0,0,0,0,0,1,0,0,0,0,0,1))
would reduce to
id x y time ev
1 A NA 5 1.0 0
2 A NA NA 2.0 0
3 A 1 NA 3.0 0
4 A 2 2 4.0 0
5 B 3 1 0.1 0
6 C 2 2 3.0 0
7 C NA 1 2.0 0
8 C NA NA 1.0 0
Pandas provide data analysts a way to delete and filter data frame using dataframe. drop() method. We can use this method to drop such rows that do not satisfy the given conditions.
To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
Here's a solution with subset
and ave
:
subset(testdf, !ave(ev, id, FUN = cumsum))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With