Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract last row for each subject from a data frame

Tags:

r

I have a data frame in R like this. I would like to extract the last visit for each subject.

SUBJID VISIT

   40161       3  
   40161       4  
   40161       5  
   40161       6  
   40161       9  
   40201       3  
   40202       6  
   40202       8  
   40241       3  
   40241       4 

The desired output is as follows

SUBJID VISIT

   40161     9  
   40201     3  
   40202     8  

How should I do this in R? Thanks very much for your help.

like image 447
user2077677 Avatar asked Feb 16 '13 05:02

user2077677


3 Answers

While agstudy is correct, there is another way with the stats package and the aggregate function.

df <- read.table(text="SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4", header=TRUE)


aggregate(VISIT ~ SUBJID, df, max)

  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4
like image 124
N8TRO Avatar answered Sep 18 '22 01:09

N8TRO


To show another alternative, because I like the simplicity of its syntax, you can use data.table too. Assuming your data.frame is called "df":

library(data.table)
# data.table 1.8.7  For help type: help("data.table")
DT <- data.table(df, key = "SUBJID")
DT[, list(VISIT = max(VISIT)), by = key(DT)]
#    SUBJID V1
# 1:  40161  9
# 2:  40201  3
# 3:  40202  8
# 4:  40241  4

And, while we are sharing the many ways to do this in R, if you're comfortable with SQL syntax, you can also use sqldf as follows:

library(sqldf)
sqldf("select SUBJID, max(VISIT) `VISIT` from df group by SUBJID")
  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4
like image 32
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 01:09

A5C1D2H2I1M1N2O1R2T1


Because we can, another base option:

 do.call(rbind,
         lapply(split(dat, dat$SUBJID), 
                function(x) tail(x$VISIT, 1) ) )
#      [,1]
#40161    9
#40201    3
#40202    8
#40241    4

EDIT

As @BenBolker suggests:

 do.call(rbind,
             lapply(split(dat, dat$SUBJID), 
                    function(x) tail(x, 1) ) )

should work for all columns if you have more.

like image 40
user1317221_G Avatar answered Sep 22 '22 01:09

user1317221_G