Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting observations within a data frame and reversing their order

Tags:

dataframe

r

I have a huge data frame that contains many time correlated observations of several variables on a few hundred individuals. The individuals each have a unique number in the ID column. I will use the data simulated below, which is structured similarly to my data to ask my question:

set.seed(123)
dat <- data.frame(ID = rep(letters[1:10], each = 10),
                  time = rep(c(1:10), times = 10),
                  var1 = rnorm(100))

Note that in the real data, the actual number of observations is different for each ID. Say there were a few individuals (e.g., IDs: b, e, and g) that I needed to take the observations for and completely "flip" or "reverse" the order of, and still preserve what data is with each time. By this I mean (using individual b as an example) that the first observation in the data frame for individual b would be the data at "time interval" 10 instead of "time interval" 1. In other words the data would look like this:

ID   time   Var1
a     1
a     2
…     … 
a     10 
b     10
b     9 
b     8
…     …
b     1
c     1
c     2
c     3
c     4
ect...

What is the safest way to do this and maintain their position in the data frame (i.e., b stays in between a and c ect..)?

like image 378
Ryan Avatar asked May 19 '20 19:05

Ryan


People also ask

How do I reverse the order of a Dataframe in R?

The rev() method in R is used to return the reversed order of the R object, be it dataframe or a vector. It computes the reverse columns by default. The resultant dataframe returns the last column first followed by the previous columns. The ordering of the rows remains unmodified.

What is an observation in a data frame?

A data frame is a table with rows and columns. The rows of a data frame are observations. Each column of a data frame contains one of the collected measures of the study. These measures are commonly referred to as variables. The data is related both across rows and over columns.

How do I rearrange rows in a Dataframe in R?

To change the row order in an R data frame, we can use single square brackets and provide the row order at first place.


2 Answers

Using data.table:

library(data.table)
setDT(dat)
ids.to.reverse <- c('b', 'e', 'g')

dat[, if(ID %in% ids.to.reverse) .SD[.N:1] else .SD, by='ID']
like image 149
MattB Avatar answered Nov 15 '22 13:11

MattB


One option is to group_split by ID and do the arrange by looping over the list with map based on whether any of the values 'b', 'e', 'g' are %n% the 'ID'

library(dplyr)
library(purrr)
out <- dat %>% 
        group_split(ID) %>%
        map_dfr(~ if(any(c('b', 'e', 'g') %in% first(.x$ID)))
         .x %>%
             arrange(desc(time)) else .x)   

out %>% 
   filter(ID %in% c('a', 'b'))
# A tibble: 20 x 3
#   ID     time    var1
#   <fct> <int>   <dbl>
# 1 a         1 -0.560 
# 2 a         2 -0.230 
# 3 a         3  1.56  
# 4 a         4  0.0705
# 5 a         5  0.129 
# 6 a         6  1.72  
# 7 a         7  0.461 
# 8 a         8 -1.27  
# 9 a         9 -0.687 
#10 a        10 -0.446 
#11 b        10 -0.473 
#12 b         9  0.701 
#13 b         8 -1.97  
#14 b         7  0.498 
#15 b         6  1.79  
#16 b         5 -0.556 
#17 b         4  0.111 
#18 b         3  0.401 
#19 b         2  0.360 
#20 b         1  1.22  

Or we can make use of arrange in a hacky way i.e. change the time to negative based on the ID 'b', 'e', 'g' while the rest is positive

out1 <- dat %>%
     arrange(ID,  time * c(1, -1)[c(1 + (ID %in% c('b', 'e', 'g')))])

-checking

all.equal(out, out1, check.attributes = FALSE)
#[1] TRUE
like image 26
akrun Avatar answered Nov 15 '22 11:11

akrun