I have a tbl_df that looks like this:
> d
Source: local data frame [3,703 x 3]
date value year
1 2001-01-01 0.1218 2001
2 2001-01-02 0.1216 2001
3 2001-01-03 0.1216 2001
4 2001-01-04 0.1214 2001
5 2001-01-05 0.1214 2001
.. ... ... ...
where dates range accross several years.
I would like to get the latest value of value
for each year (which is not consistently the 31-12). Is there a way to do that using an idiom such as: d %>% group_by(year) %>% summarise(...)
?
First of all, create a list. Then, use tail function with sapply function to extract the last value of all elements in the list.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
To find the last element of the vector we can also use tail() function.
You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.
Here are some options
library(dplyr)
d %>%
group_by(year) %>%
summarise(value=last(value))
Or may be (not very clear in the description)
d %>%
group_by(year) %>%
slice(which.max(date)) %>%
select(value)
Or
d %>%
group_by(year) %>%
filter(date==max(date)) %>%
select(value)
Or we can use arrange
to order the 'date' (in case it is not ordered) and get the last
value
d %>%
group_by(year) %>%
arrange(date) %>%
summarise(value=last(value))
In case, you want to try with data.table
, here is one
library(data.table)
setDT(d)[, value[which.max(date)], year]
Or as @David Arenburg commented
unique(setDT(d)[order(-date)], by = "year")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With