I have a dataframe
df that looks like this:
ID is a certain stock,
value is dummies for the stock price,
year is calculated by me as a helper column for splitting the data in the following:
id date value year 1 2020-12-30 11 2020 1 2020-12-09 12 2020 1 2020-08-01 13 2020 1 2019-12-30 14 2019 1 2019-12-09 15 2019 1 2019-08-01 16 2019 2 2020-12-30 17 2020 2 2020-12-09 18 2020 2 2020-08-01 19 2020 2 2019-12-29 20 2019 2 2019-12-09 21 2019 2 2019-08-01 22 2019
I want to find for each id in each year what the last day for which I have data is. Typically, this is the year-end, but this is not always the case in my large data set, so I don't want to hard-code the year end.
I have already split it into a list based on the id and the year with the following code and result:
list <- split(df, list(df$id, df$year))
Now, in each of the 4 elements of the list, I want to create a new column that gives me the max value of the date column in the respective list. E.g. I expect the following output for the first list element:
id date value year maxdate 1 2020-12-30 11 2020 2020-12-30 1 2020-12-09 12 2020 2020-12-30 1 2020-08-01 13 2020 2020-12-30
Can you help me achieve the desired output?
I have tried to use some version of the apply family of functions, but could not get it to work just based on the date column in each of my list elements.
Thank you very much in advance!
Best regards, C
lapply to loop over the
transform to create the 'maxdate'
list1 <- lapply(list1, transform, maxdate = max(date))
assuming that the 'date' is
library(dplyr) library(purrr) list1 <- map(list1, ~ .x %>% mutate(maxdate = max(date)))
It can be simplified without splitting as well if we use a group by operation
df %>% group_by(id, year) %>% mutate(maxdate = max(date))
list1 <- split(df, list(df$id, df$year), drop = TRUE)
NOTE: It is better not to name objects with function names e.g.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!Donate Us With