I would appreciate some help with the following task: From the data frame below (C
), for each id I would like to subtract the first entry under column d_2
from the final entry and then store the results in another dataframe containing the same ids. I can then merge this with my initial dataframe. Pls note that the subtraction has to be in this order (last entry minus first entry for each id
).
Here are the codes:
id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")
d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))
set.seed(2)
d_2 <- round(runif(13, -1, 1), 2)
C <- data.frame(id, d_1, d_2)
id d_1 d_2
A1 1.15 -0.63
A1 1.15 0.40
B10 1.44 0.15
B10 1.44 -0.66
B500 1.34 0.89
B500 1.34 0.89
C100 1.50 -0.74
C100 1.50 0.67
C100 1.50 -0.06
D40 1.90 0.10
D40 1.90 0.11
G100 1.59 -0.52
G100 1.59 0.52
Desired result:
id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")
difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)
diff_df <- data.frame(id2, difference)
id2 difference
A1 1.03
B10 -0.81
B500 0.00
C100 0.68
D40 0.01
G100 1.04
I attempted this by using ddply
to obtain the first and last entries but I'm really struggling with indexing the "function argument" in the second code (below) to get the desired outcome.
C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])
ddply(C_1, .(patient), function )
To be honest, I'm not very familiar with the ddply package-I got the code above from another post on stack exchange .
My original data is a groupedData and I believe another way of approaching this is using gapply
but again I'm struggling with the third argument here (usually a function)
grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))
x1 <- gapply(grouped_C, "d_2", first_entry)
x2 <- gapply(grouped_C, "d_2", last_entry)
where first_entry and last_entry are functions to help me get the first and and last entries.
I can then get the difference with: x2 - x1
. However, I'm not sure what to input as first_entry and last_entry in the above codes (perhaps to do with head or tail ?).
Any help would be much appreciated.
This can be done easily with dplyr
. The last
and first
functions are very helpful for this task.
library(dplyr) #install the package dplyr and load it into library
diff_df <- C %>% #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
group_by(id) %>% #group the whole data.frame C by id
summarize(difference = last(d_2)-first(d_2)) #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group
# id difference #this is the result stored in diff_df
#1 A1 1.03
#2 B10 -0.81
#3 B500 0.00
#4 C100 0.68
#5 D40 0.01
#6 G100 1.04
Edit note: updated post with %>%
instead of %.%
which is deprecated.
If you have any singletons and they need to be left alone, then this will solve your problem. It's the same as docendo discimus's answer, but with an if-else
component to deal with the singleton cases:
library(dplyr)
diff_df <- C %>%
group_by(id) %>%
summarize(difference = if(n() > 1) last(d_2) - first(d_2) else d_2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With