I have a huge data frame that contains many time correlated observations of several variables on a few hundred individuals. The individuals each have a unique number in the <code>ID</code> column. I will use the data simulated below, which is structured similarly to my data to ask my question: <pre class="prettyprint lang-r prettyprint-override"><code>set.seed(123) dat <- data.frame(ID = rep(letters[1:10], each = 10), time = rep(c(1:10), times = 10), var1 = rnorm(100)) </code></pre> Note that in the real data, the actual number of observations is different for each <code>ID</code>. Say there were a few individuals (e.g., <code>ID</code>s: b, e, and g) that I needed to take the observations for and completely "flip" or "reverse" the order of, and still preserve what data is with each <code>time</code>. By this I mean (using individual b as an example) that the first observation in the data frame for individual b would be the data at "time interval" 10 instead of "time interval" 1. In other words the data would look like this: <pre class="prettyprint lang-r prettyprint-override"><code>ID time Var1 a 1 a 2 … … a 10 b 10 b 9 b 8 … … b 1 c 1 c 2 c 3 c 4 ect... </code></pre> What is the safest way to do this and maintain their position in the data frame (i.e., <code>b</code> stays in between <code>a</code> and <code>c</code> ect..)?

Using <code>data.table</code>: <pre class="prettyprint"><code>library(data.table) setDT(dat) ids.to.reverse <- c('b', 'e', 'g') dat[, if(ID %in% ids.to.reverse) .SD[.N:1] else .SD, by='ID'] </code></pre>

Selecting observations within a data frame and reversing their order

Tags:

dataframe

r

I have a huge data frame that contains many time correlated observations of several variables on a few hundred individuals. The individuals each have a unique number in the ID column. I will use the data simulated below, which is structured similarly to my data to ask my question:

set.seed(123)
dat <- data.frame(ID = rep(letters[1:10], each = 10),
                  time = rep(c(1:10), times = 10),
                  var1 = rnorm(100))

Note that in the real data, the actual number of observations is different for each ID. Say there were a few individuals (e.g., IDs: b, e, and g) that I needed to take the observations for and completely "flip" or "reverse" the order of, and still preserve what data is with each time. By this I mean (using individual b as an example) that the first observation in the data frame for individual b would be the data at "time interval" 10 instead of "time interval" 1. In other words the data would look like this:

ID   time   Var1
a     1
a     2
…     … 
a     10 
b     10
b     9 
b     8
…     …
b     1
c     1
c     2
c     3
c     4
ect...

What is the safest way to do this and maintain their position in the data frame (i.e., b stays in between a and c ect..)?

378

asked May 19 '20 19:05

Ryan

2 Answers

Using data.table:

library(data.table)
setDT(dat)
ids.to.reverse <- c('b', 'e', 'g')

dat[, if(ID %in% ids.to.reverse) .SD[.N:1] else .SD, by='ID']

149

answered Nov 15 '22 13:11

MattB

One option is to group_split by ID and do the arrange by looping over the list with map based on whether any of the values 'b', 'e', 'g' are %n% the 'ID'

library(dplyr)
library(purrr)
out <- dat %>% 
        group_split(ID) %>%
        map_dfr(~ if(any(c('b', 'e', 'g') %in% first(.x$ID)))
         .x %>%
             arrange(desc(time)) else .x)   

out %>% 
   filter(ID %in% c('a', 'b'))
# A tibble: 20 x 3
#   ID     time    var1
#   <fct> <int>   <dbl>
# 1 a         1 -0.560 
# 2 a         2 -0.230 
# 3 a         3  1.56  
# 4 a         4  0.0705
# 5 a         5  0.129 
# 6 a         6  1.72  
# 7 a         7  0.461 
# 8 a         8 -1.27  
# 9 a         9 -0.687 
#10 a        10 -0.446 
#11 b        10 -0.473 
#12 b         9  0.701 
#13 b         8 -1.97  
#14 b         7  0.498 
#15 b         6  1.79  
#16 b         5 -0.556 
#17 b         4  0.111 
#18 b         3  0.401 
#19 b         2  0.360 
#20 b         1  1.22

Or we can make use of arrange in a hacky way i.e. change the time to negative based on the ID 'b', 'e', 'g' while the rest is positive

out1 <- dat %>%
     arrange(ID,  time * c(1, -1)[c(1 + (ID %in% c('b', 'e', 'g')))])

-checking

all.equal(out, out1, check.attributes = FALSE)
#[1] TRUE

answered Nov 15 '22 11:11

akrun

Related questions
                            
                                Change tick mark labels to specific strings in plotly
                            
                                Convert sets of spatial coordinates to polygons in R using sf
                            
                                How to remove white spaces between stacked geom_col
                            
                                R ggplot2: change colour of font and background in facet strip?
                            
                                Split column by multiple delimiters, keeping delimiters
                            
                                changing all values in one column in a filtered data.frame in R
                            
                                Using ggsave with a pipe
                            
                                R: Using pipe %>% and pkg::fo leads to error "Error in .::base : unused argument"
                            
                                How to output values of R variables in an inline LateX equation in R Markdown (i.e. dynamically updated)
                            
                                How to extract every ggplot2 plot from a nested list
                            
                                Creating new vector that represents the count
                            
                                Pandoc error 1033 when rendering multiple Rmarkdown reports
                            
                                How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?
                            
                                Replace part of string with mutate (in a pipe)
                            
                                Plotting one variable both line-only and points-only, depending on value
                            
                                Converting data from wide to long format when id variables are encoded in column header [duplicate]
                            
                                lme4 error: boundary (singular) fit: see ?isSingular
                            
                                What's the preferred means for defining an S3 method in an R package without introducing a dependency?
                            
                                How to connect R conda env to jupyter notebook
                            
                                Problems merging data frames in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With