Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get first and last values per group – dplyr group_by with last() and first()

The code below should group the data by year and then create two new columns with the first and last value of each year.

library(dplyr)

set.seed(123)

d <- data.frame(
    group = rep(1:3, each = 3),
    year = rep(seq(2000,2002,1),3),
    value = sample(1:9, r = T))

d %>% 
    group_by(group) %>%
    mutate(
        first = dplyr::first(value),
        last = dplyr::last(value)
    )

However, it does not work as it should. The expected result would be

  group  year value first  last
  <int> <dbl> <int> <int> <int>
1     1  2000     3     3     4
2     1  2001     8     3     4
3     1  2002     4     3     4
4     2  2000     8     8     1
5     2  2001     9     8     1
6     2  2002     1     8     1
7     3  2000     5     5     5
8     3  2001     9     5     5
9     3  2002     5     5     5

Yet, I get this (it takes the first and the last value over the entire data frame, not just the groups):

  group  year value first  last
  <int> <dbl> <int> <int> <int>
1     1  2000     3     3     5
2     1  2001     8     3     5
3     1  2002     4     3     5
4     2  2000     8     3     5
5     2  2001     9     3     5
6     2  2002     1     3     5
7     3  2000     5     3     5
8     3  2001     9     3     5
9     3  2002     5     3     5
like image 663
phillyooo Avatar asked Mar 07 '17 17:03

phillyooo


People also ask

How do you use first and last in R?

Initial answer (edited) For instance, if you wanted the first three rows and last three rows of each group, you can use: DT[, . SD[c(1:3, (. N-2):. N)], by=Species] (Just for reference: .

What is the purpose of Group_by () function?

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

What does Groupby () do in R?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.


2 Answers

dplyr::mutate() did the trick

d %>% 
    group_by(group) %>%
    dplyr::mutate(
        first = dplyr::first(value),
        last = dplyr::last(value)
    )
like image 101
phillyooo Avatar answered Sep 20 '22 07:09

phillyooo


You can also try by using summarise function within dpylr to get the first and last values of unique groups

 d %>% 
    group_by(group) %>% 
        summarise(first_value = first(na.omit(values)),
            last_value = last(na.omit(values))) %>% 
               left_join(d, ., by = 'group')
like image 20
Arun kumar mahesh Avatar answered Sep 21 '22 07:09

Arun kumar mahesh