Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summarize in dplyr with the maximum value of the date - R

Tags:

r

dplyr

I have the following data,

data

date           ID       value1        value2
2016-04-03     1          0              1
2016-04-10     1          6              2
2016-04-17     1          7              3
2016-04-24     1          2              4
2016-04-03     2          1              5
2016-04-10     2          5              6
2016-04-17     2          9              7
2016-04-24     2          4              8

Now I want to group by ID and find the mean of value2 and latest value of value1. Latest value in the sense, I would like to get the value of latest date i.e. here I would like to get the value1 for corresponding value of 2016-04-24 for each IDs. My output should be like,

ID       max_value1      mean_value2
1             2              2.5
2             4              6.5 

The following is the command I am using,

data %>% group_by(ID) %>% summarize(mean_value2 = mean(value2))

But I am not sure how to do the first one. Can anybody help me in getting the latest value of value1 while summarizing in dplyr?

like image 741
haimen Avatar asked Feb 06 '23 10:02

haimen


2 Answers

One way would be the following. My assumption here is that date is a date object. You want to arrange the order of date for each ID using arrange. Then, you group the data by ID. In summarize, you can use last() to take the last value1 for each ID.

arrange(data,ID,date) %>%
group_by(ID) %>%
summarize(mean_value2 = mean(value2), max_value1 = last(value1))

#     ID mean_value2 max_value1
#  <int>       <dbl>      <int>
#1     1         2.5          2
#2     2         6.5          4

DATA

data <- structure(list(date = structure(c(16894, 16901, 16908, 16915, 
16894, 16901, 16908, 16915), class = "Date"), ID = c(1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L), value1 = c(0L, 6L, 7L, 2L, 1L, 5L, 9L, 
4L), value2 = 1:8), .Names = c("date", "ID", "value1", "value2"
), row.names = c(NA, -8L), class = "data.frame")
like image 73
jazzurro Avatar answered Feb 09 '23 00:02

jazzurro


Here is an option with data.table

library(data.table)
setDT(data)[,  .(max_value1 = value1[which.max(date)], 
                        mean_value2 = mean(value2)) , by = ID]
 #   ID max_value1 mean_value2
 #1:  1          2         2.5
 #2:  2          4         6.5
like image 21
akrun Avatar answered Feb 09 '23 01:02

akrun