Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R–Apply function to specific column in list

Tags:

r

I have a dataframe df that looks like this: ID is a certain stock, value is dummies for the stock price, year is calculated by me as a helper column for splitting the data in the following:

id      date value   year
1 2020-12-30    11   2020
1 2020-12-09    12   2020
1 2020-08-01    13   2020 
1 2019-12-30    14   2019
1 2019-12-09    15   2019
1 2019-08-01    16   2019
2 2020-12-30    17   2020
2 2020-12-09    18   2020
2 2020-08-01    19   2020
2 2019-12-29    20   2019
2 2019-12-09    21   2019
2 2019-08-01    22   2019

I want to find for each id in each year what the last day for which I have data is. Typically, this is the year-end, but this is not always the case in my large data set, so I don't want to hard-code the year end.

I have already split it into a list based on the id and the year with the following code and result:

list <- split(df, list(df$id, df$year))

Now, in each of the 4 elements of the list, I want to create a new column that gives me the max value of the date column in the respective list. E.g. I expect the following output for the first list element:

id      date value   year    maxdate
1 2020-12-30    11   2020 2020-12-30
1 2020-12-09    12   2020 2020-12-30
1 2020-08-01    13   2020 2020-12-30

Can you help me achieve the desired output?

I have tried to use some version of the apply family of functions, but could not get it to work just based on the date column in each of my list elements.

Thank you very much in advance!

Best regards, C

like image 601
Chris937 Avatar asked Jan 24 '23 12:01

Chris937


1 Answers

We use lapply to loop over the list and transform to create the 'maxdate'

list1 <- lapply(list1, transform, maxdate = max(date))

assuming that the 'date' is Date class


Or using tidyverse

library(dplyr)
library(purrr)
list1 <- map(list1, ~ .x %>%
                         mutate(maxdate = max(date)))

It can be simplified without splitting as well if we use a group by operation

df %>%
    group_by(id, year) %>%
    mutate(maxdate = max(date))

where

list1 <- split(df, list(df$id, df$year), drop = TRUE)

NOTE: It is better not to name objects with function names e.g. list

like image 84
akrun Avatar answered Jan 31 '23 14:01

akrun