Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr summarise one column value based on index of fun(another column)

Tags:

r

dplyr

I have a data frame as this, and want the output as shown desired at the end. Instead, I get the NA output in the middle. Is there any way to do what I want using dplyr?

x <- c(1234, 1234, 1234, 5678, 5678)
y <- c(95138, 30004, 90038, 01294, 15914)
z <- c('2014-01-20', '2014-10-30', '2015-04-12', '2010-2-28', '2015-01-01')
df <- data.frame(x, y, z)
df$z <- as.Date(df$z)
df %>% group_by(x) %>% summarise(y = y[max(z)])

What I get:
     x  y
1 1234 NA
2 5678 NA

Desired Output:
     x     y 
1 1234 90038
2 5678 15914
like image 231
Gopala Avatar asked May 05 '15 15:05

Gopala


2 Answers

You can try which.max to get the numeric index of max values that can be used for subsetting the 'y' element. Using max just gives the maximum values of z.

df %>%
    group_by(x) %>%
    summarise(y= y[which.max(z)])
#     x     y
#1 1234 90038
#2 5678 15914
like image 50
akrun Avatar answered Sep 29 '22 16:09

akrun


Use filter and max in dplyr.

df%>%group_by(x)%>%filter(z==max(z))
like image 42
Shenglin Chen Avatar answered Sep 29 '22 15:09

Shenglin Chen