Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a minimum value by group [duplicate]

Tags:

r

dplyr

I have a data frame looks like this

library(dplyr)
test.df <- data.frame(id=c(1,1,1,3,3,3,3),
                      date=c("2016-02-13","2016-06-01",
                             "2016-09-01","2015-08-02",
                             "2015-09-21","2016-12-01",
                             "2017-02-11"))

test.df$date <- as.Date(test.df$date,format='%Y-%m-%d')

id    date
1   2016-02-13          
1   2016-06-01          
1   2016-09-01          
3   2015-08-02          
3   2015-09-21          
3   2016-12-01          
3   2017-02-11  

And I want to create a new variable first.login to get first date of each id. The output will look like this

id    date      first.login
1   2016-02-13  2016-02-13
1   2016-06-01  2016-02-13      
1   2016-09-01  2016-02-13      
3   2015-08-02  2015-08-02      
3   2015-09-21  2015-08-02      
3   2016-12-01  2015-08-02      
3   2017-02-11  2015-08-02

I try to use code like this

new.df <- test.df %>% 
  group_by(id) %>% 
  mutate(first.log = min(date))

But this gives the result that extracts earliest date for the whole data frame, not within each ID group.

id    date      first.login
1   2016-02-13  2015-08-02
1   2016-06-01  2015-08-02      
1   2016-09-01  2015-08-02      
3   2015-08-02  2015-08-02      
3   2015-09-21  2015-08-02      
3   2016-12-01  2015-08-02      
3   2017-02-11  2015-08-02

This shouldn't be a tricky task, but I was wondering what mistake did I make? How can I get the earliest within each id group?

Update: I've tried to use summarize before,

new.df <- test.df %>% 
  group_by(id) %>% 
  summarize(first.login = min(date))

but it returns a single row and column.

first.log
2015-08-02

It turns that there's nothing wrong with these codes; I just need to specify dplyr::mutate in it.

like image 896
Helen Avatar asked Dec 11 '25 08:12

Helen


2 Answers

You want to use summarize instead of mutate

new.df <- test.df %>% 
  group_by(id) %>% 
  summarize(first.log = min(date))
like image 79
waskuf Avatar answered Dec 13 '25 22:12

waskuf


Here's a step-by-step R base solution:

# renaming for easy handle
x <- test.df$date
g <- test.df$id
# getting min
split(x, g) <- lapply(split(x, g), min)
# merging
test.df$first.login <- do.call("c", split(x, g))
#printting result
test.df
  id       date first.login
1  1 2016-02-13  2016-02-13
2  1 2016-06-01  2016-02-13
3  1 2016-09-01  2016-02-13
4  3 2015-08-02  2015-08-02
5  3 2015-09-21  2015-08-02
6  3 2016-12-01  2015-08-02
7  3 2017-02-11  2015-08-02

Actually this is how ave Works inside

like image 21
Jilber Urbina Avatar answered Dec 13 '25 20:12

Jilber Urbina



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!