Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr - Get last value for each year

Tags:

r

dplyr

I have a tbl_df that looks like this:

> d
Source: local data frame [3,703 x 3]

         date  value year
1  2001-01-01 0.1218 2001
2  2001-01-02 0.1216 2001
3  2001-01-03 0.1216 2001
4  2001-01-04 0.1214 2001
5  2001-01-05 0.1214 2001
..        ...    ...  ...

where dates range accross several years.

I would like to get the latest value of value for each year (which is not consistently the 31-12). Is there a way to do that using an idiom such as: d %>% group_by(year) %>% summarise(...)?

like image 856
Alexandre Halm Avatar asked May 17 '15 14:05

Alexandre Halm


People also ask

How do I get the last value in a list in R?

First of all, create a list. Then, use tail function with sapply function to extract the last value of all elements in the list.

What does %>% do in Dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I get the last index of a vector in R?

To find the last element of the vector we can also use tail() function.

How to Get last row of each group R?

You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.


1 Answers

Here are some options

library(dplyr)
d %>% 
  group_by(year) %>%
  summarise(value=last(value))

Or may be (not very clear in the description)

d %>% 
  group_by(year) %>%
  slice(which.max(date)) %>%
  select(value) 

Or

d %>%
  group_by(year) %>%
  filter(date==max(date)) %>%
  select(value)

Or we can use arrange to order the 'date' (in case it is not ordered) and get the last value

d %>%
  group_by(year) %>%
  arrange(date) %>%
  summarise(value=last(value))

In case, you want to try with data.table, here is one

library(data.table)
setDT(d)[, value[which.max(date)], year]

Or as @David Arenburg commented

 unique(setDT(d)[order(-date)], by = "year")
like image 99
akrun Avatar answered Oct 23 '22 18:10

akrun