I have a huge data frame that contains many time correlated observations of several variables on a few hundred individuals. The individuals each have a unique number in the ID
column. I will use the data simulated below, which is structured similarly to my data to ask my question:
set.seed(123)
dat <- data.frame(ID = rep(letters[1:10], each = 10),
time = rep(c(1:10), times = 10),
var1 = rnorm(100))
Note that in the real data, the actual number of observations is different for each ID
.
Say there were a few individuals (e.g., ID
s: b, e, and g) that I needed to take the observations for and completely "flip" or "reverse" the order of, and still preserve what data is with each time
. By this I mean (using individual b as an example) that the first observation in the data frame for individual b would be the data at "time interval" 10 instead of "time interval" 1. In other words the data would look like this:
ID time Var1
a 1
a 2
… …
a 10
b 10
b 9
b 8
… …
b 1
c 1
c 2
c 3
c 4
ect...
What is the safest way to do this and maintain their position in the data frame (i.e., b
stays in between a
and c
ect..)?
The rev() method in R is used to return the reversed order of the R object, be it dataframe or a vector. It computes the reverse columns by default. The resultant dataframe returns the last column first followed by the previous columns. The ordering of the rows remains unmodified.
A data frame is a table with rows and columns. The rows of a data frame are observations. Each column of a data frame contains one of the collected measures of the study. These measures are commonly referred to as variables. The data is related both across rows and over columns.
To change the row order in an R data frame, we can use single square brackets and provide the row order at first place.
Using data.table
:
library(data.table)
setDT(dat)
ids.to.reverse <- c('b', 'e', 'g')
dat[, if(ID %in% ids.to.reverse) .SD[.N:1] else .SD, by='ID']
One option is to group_split
by ID and do the arrange
by looping over the list
with map
based on whether any
of the values 'b', 'e', 'g' are %n%
the 'ID'
library(dplyr)
library(purrr)
out <- dat %>%
group_split(ID) %>%
map_dfr(~ if(any(c('b', 'e', 'g') %in% first(.x$ID)))
.x %>%
arrange(desc(time)) else .x)
out %>%
filter(ID %in% c('a', 'b'))
# A tibble: 20 x 3
# ID time var1
# <fct> <int> <dbl>
# 1 a 1 -0.560
# 2 a 2 -0.230
# 3 a 3 1.56
# 4 a 4 0.0705
# 5 a 5 0.129
# 6 a 6 1.72
# 7 a 7 0.461
# 8 a 8 -1.27
# 9 a 9 -0.687
#10 a 10 -0.446
#11 b 10 -0.473
#12 b 9 0.701
#13 b 8 -1.97
#14 b 7 0.498
#15 b 6 1.79
#16 b 5 -0.556
#17 b 4 0.111
#18 b 3 0.401
#19 b 2 0.360
#20 b 1 1.22
Or we can make use of arrange
in a hacky way i.e. change the time
to negative based on the ID 'b', 'e', 'g' while the rest is positive
out1 <- dat %>%
arrange(ID, time * c(1, -1)[c(1 + (ID %in% c('b', 'e', 'g')))])
-checking
all.equal(out, out1, check.attributes = FALSE)
#[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With