I have the following data,
data
date ID value1 value2
2016-04-03 1 0 1
2016-04-10 1 6 2
2016-04-17 1 7 3
2016-04-24 1 2 4
2016-04-03 2 1 5
2016-04-10 2 5 6
2016-04-17 2 9 7
2016-04-24 2 4 8
Now I want to group by ID and find the mean of value2 and latest value of value1. Latest value in the sense, I would like to get the value of latest date i.e. here I would like to get the value1 for corresponding value of 2016-04-24 for each IDs. My output should be like,
ID max_value1 mean_value2
1 2 2.5
2 4 6.5
The following is the command I am using,
data %>% group_by(ID) %>% summarize(mean_value2 = mean(value2))
But I am not sure how to do the first one. Can anybody help me in getting the latest value of value1 while summarizing in dplyr?
One way would be the following. My assumption here is that date
is a date object. You want to arrange the order of date for each ID using arrange
. Then, you group the data by ID
. In summarize
, you can use last()
to take the last value1 for each ID.
arrange(data,ID,date) %>%
group_by(ID) %>%
summarize(mean_value2 = mean(value2), max_value1 = last(value1))
# ID mean_value2 max_value1
# <int> <dbl> <int>
#1 1 2.5 2
#2 2 6.5 4
DATA
data <- structure(list(date = structure(c(16894, 16901, 16908, 16915,
16894, 16901, 16908, 16915), class = "Date"), ID = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), value1 = c(0L, 6L, 7L, 2L, 1L, 5L, 9L,
4L), value2 = 1:8), .Names = c("date", "ID", "value1", "value2"
), row.names = c(NA, -8L), class = "data.frame")
Here is an option with data.table
library(data.table)
setDT(data)[, .(max_value1 = value1[which.max(date)],
mean_value2 = mean(value2)) , by = ID]
# ID max_value1 mean_value2
#1: 1 2 2.5
#2: 2 4 6.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With