How to duplicate last row by group (ID)?

Tags:

I have a data.frame of surfaces touched over time. I would like to simply append a duplicate of the last row for each AcvitivityID:

head(movsdf.rbind)
  ActivityID CareType HCWType Orientation    Surface       Date     Time       Dev.Date.Time SurfaceCategories
1         01       IV    RN01  leftFacing AlcOutside 2019-08-03 11:08:01 2019-08-03 11:08:01       HygieneArea
2         01       IV    RN01  leftFacing         In 2019-08-03 11:08:12 2019-08-03 11:08:12                In
3         01       IV    RN01  leftFacing       Door 2019-08-03 11:08:12 2019-08-03 11:08:12        FarPatient
4         02       IV    RN01  leftFacing       Door 2019-08-03 11:08:18 2019-08-03 11:08:18        FarPatient
5         02       IV    RN01  leftFacing      Other 2019-08-03 11:08:22 2019-08-03 11:08:22        FarPatient
6         03       IV    RN01  leftFacing      Table 2019-08-03 11:10:26 2019-08-03 11:10:26       NearPatient

Example data:

movsdf.rbind<-data.frame(ActivityID=rep(1:4, each=10),Surface=rep(c("In","Table","Out"),each=10))

So I can get this to work from here :

repeatss <- aggregate(movsdf.rbind, by=list(movsdf.rbind$ActivityID), FUN = function(x) { last = tail(x,1) })

movsdf.rbind <-rbind(movsdf.rbind, repeatss)

This does the trick but it looks clunky and then the data is not in order (not that it really matters but I feel something more elegant might exist in dplyr or data.table). Any thoughts?

304

asked Aug 20 '19 11:08

HCAI

1 Answers

Another alternative using slice:

library(dplyr)

DF %>% 
  group_by(ActivityID) %>% 
  slice(c(1:n(),n()))

which gives:

# A tibble: 9 x 9
# Groups:   ActivityID [3]
  ActivityID CareType HCWType Orientation Surface    Date      Time     Dev.Date.Time     SurfaceCategori~
       <int> <chr>    <chr>   <chr>       <chr>      <chr>     <chr>    <chr>             <chr>           
1          1 IV       RN01    leftFacing  AlcOutside 2019-08-~ 11:08:01 2019-08-03 11:08~ HygieneArea     
2          1 IV       RN01    leftFacing  In         2019-08-~ 11:08:12 2019-08-03 11:08~ In              
3          1 IV       RN01    leftFacing  Door       2019-08-~ 11:08:12 2019-08-03 11:08~ FarPatient      
4          1 IV       RN01    leftFacing  Door       2019-08-~ 11:08:12 2019-08-03 11:08~ FarPatient      
5          2 IV       RN01    leftFacing  Door       2019-08-~ 11:08:18 2019-08-03 11:08~ FarPatient      
6          2 IV       RN01    leftFacing  Other      2019-08-~ 11:08:22 2019-08-03 11:08~ FarPatient      
7          2 IV       RN01    leftFacing  Other      2019-08-~ 11:08:22 2019-08-03 11:08~ FarPatient      
8          3 IV       RN01    leftFacing  Table      2019-08-~ 11:10:26 2019-08-03 11:10~ NearPatient     
9          3 IV       RN01    leftFacing  Table      2019-08-~ 11:10:26 2019-08-03 11:10~ NearPatient

Two base R alternatives:

# one
lastrows <- cumsum(aggregate(CareType ~ ActivityID, DF, length)[[2]])
DF[sort(c(seq(nrow(DF)), lastrows)),]

# two
idx <- unlist(tapply(1:nrow(DF), DF$ActivityID, FUN = function(x) c(x, tail(x, 1))))
DF[idx,]

which both give the same result.

Two data.table alternatives:

library(data.table)
setDT(DF)          # convert 'DF' to a data.table

# one
DF[DF[, .I[c(1:.N,.N)], by = ActivityID]$V1]

# two
DF[, .SD[c(1:.N,.N)], by = ActivityID]

Used data:

DF <- structure(list(ActivityID = c(1L, 1L, 1L, 2L, 2L, 3L),
                     CareType = c("IV", "IV", "IV", "IV", "IV", "IV"),
                     HCWType = c("RN01", "RN01", "RN01", "RN01", "RN01", "RN01"),
                     Orientation = c("leftFacing", "leftFacing", "leftFacing", "leftFacing", "leftFacing", "leftFacing"),
                     Surface = c("AlcOutside", "In", "Door", "Door", "Other", "Table"),
                     Date = c("2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03", "2019-08-03"),
                     Time = c("11:08:01", "11:08:12", "11:08:12", "11:08:18", "11:08:22", "11:10:26"),
                     Dev.Date.Time = c("2019-08-03 11:08:01", "2019-08-03 11:08:12", "2019-08-03 11:08:12", "2019-08-03 11:08:18", "2019-08-03 11:08:22", "2019-08-03 11:10:26"),
                     SurfaceCategories = c("HygieneArea", "In", "FarPatient", "FarPatient", "FarPatient", "NearPatient")),
                class = "data.frame", row.names = c(NA, -6L))

142

answered Oct 13 '22 20:10

Jaap

Related questions
                            
                                Getting rid of border in pdf output for geom_label for ggplot2 in R
                            
                                Order multiple variables in ggplot2
                            
                                Wider margins for grid.arrange function
                            
                                Table including explicit NAs in R > 3.4.0
                            
                                Aggregating values on a data tree with R
                            
                                How to get sha of current git commit from R
                            
                                circle around a geographic point with st_buffer
                            
                                How to set class_weight in keras package of R?
                            
                                Error faced while using TM package's VCorpus in R
                            
                                Aligning labels with ggrepel
                            
                                Add annotation and segments to groups of legend elements
                            
                                Visualising a three way interaction between two continuous variables and one categorical variable in R
                            
                                I can't find "Knit" button in RStudio
                            
                                add CSS class to shiny textOutput
                            
                                count number of digits in a string in r
                            
                                Creating a ts time series with missing values from a data frame
                            
                                ggplot increase border line thickness
                            
                                Programmatically switch package in `::` call in R
                            
                                Summarize data at different aggregate levels - R and tidyverse
                            
                                adding multiple layers to a ggplot with a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to duplicate last row by group (ID)?

Tags:

r

data.table

dplyr

tidyverse

HCAI

People also ask

1 Answers

Jaap

Recent Activity

Donate For Us