I have searched SO and although there are many QA about conditionally removing rows none of the QA fit my problem. I have a <code>data.frame</code> containing longitudinal measurements of variable <code>x</code>, <code>y</code> etc... , at various time points <code>time</code>, in several subjects <code>id</code>. Some subjects experience an event <code>ev</code> (denoted as <code>1</code>, otherwise <code>0</code> at some <code>time</code>). I would like to reduce the initial <code>data.frame</code> to: <ul> <li>1) All rows with subjects that have not experienced an event (ok, thats easy) but also include </li> <li>2) For the subjects that have experienced an event, all rows just prior to the event (that is all rows whith times less that the time of the event of that individual).</li> </ul> so that, <pre class="prettyprint"><code>testdf<-data.frame(id=c(rep("A",4),rep("B",4),rep("C",4) ), x=c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5), y=rev(c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5)), time=c(1,2,3,4,0.1,0.5,10,20,3,2,1,0.5), ev=c(0,0,0,0,0,1,0,0,0,0,0,1)) </code></pre> would reduce to <pre class="prettyprint"><code> id x y time ev 1 A NA 5 1.0 0 2 A NA NA 2.0 0 3 A 1 NA 3.0 0 4 A 2 2 4.0 0 5 B 3 1 0.1 0 6 C 2 2 3.0 0 7 C NA 1 2.0 0 8 C NA NA 1.0 0 </code></pre>

Here's a solution with <code>subset</code> and <code>ave</code>: <pre class="prettyprint"><code>subset(testdf, !ave(ev, id, FUN = cumsum)) </code></pre>

Conditionally remove rows from dataframe (more than one conditions)

Tags:

conditional

I have searched SO and although there are many QA about conditionally removing rows none of the QA fit my problem.

I have a data.frame containing longitudinal measurements of variable x, y etc... , at various time points time, in several subjects id. Some subjects experience an event ev (denoted as 1, otherwise 0 at some time). I would like to reduce the initial data.frame to:

1) All rows with subjects that have not experienced an event (ok, thats easy) but also include
2) For the subjects that have experienced an event, all rows just prior to the event (that is all rows whith times less that the time of the event of that individual).

so that,

testdf<-data.frame(id=c(rep("A",4),rep("B",4),rep("C",4) ),
                   x=c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5), 
                   y=rev(c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5)),
                   time=c(1,2,3,4,0.1,0.5,10,20,3,2,1,0.5),
                   ev=c(0,0,0,0,0,1,0,0,0,0,0,1))

would reduce to

   id  x  y time ev
1   A NA  5  1.0  0
2   A NA NA  2.0  0
3   A  1 NA  3.0  0
4   A  2  2  4.0  0
5   B  3  1  0.1  0
6   C  2  2  3.0  0
7   C NA  1  2.0  0
8   C NA NA  1.0  0

871

asked Jan 26 '13 14:01

ECII

1 Answers

Here's a solution with subset and ave:

subset(testdf, !ave(ev, id, FUN = cumsum))

114

answered Sep 17 '22 15:09

Sven Hohenstein

Related questions
                            
                                merge data with partial match in r
                            
                                split apply recombine, plyr, data.table in R
                            
                                Automatic curve fitting in R
                            
                                Errors when attempting constrained optimisation using optim()
                            
                                Converting an XTS object to a data.frame [duplicate]
                            
                                How to remove a row from zoo/xts object, given a timestamp
                            
                                Prevent names in dataframe list from disappearing
                            
                                Conditional mean statement
                            
                                Extract estimates of GAM
                            
                                Plot a character vector against a numeric vector in R
                            
                                Finding dimensional index in a multi-dimensional array in R
                            
                                Is there an R function for finding the rows that contains a specific element in a matrix?
                            
                                C++ and R interface, getting output
                            
                                Installing Rmpi on LAM/MPI cluster
                            
                                How do I extract hashtags from tweets in R?
                            
                                How to get sum of values every 8 days by date in data frame in R
                            
                                replace list elements (avoid global assignment)
                            
                                Hashing function for mapping integers to a given range?
                            
                                How do I set width of candles in candle chart using plot.xts?
                            
                                "last name, first name" -> "first name last name" in serialized strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With