Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a new dataframe in R that combines the first date and last date available for each ID?

Tags:

r

For instance, suppose I have the following dataframe:

ID<-c("A", "A", "B", "B", "B", "C")
StartDate<-as.Date(c("2018-01-01", "2019-02-05", "2016-04-18", "2020-03-03", "2021-12-13", "2014-03-03"), "%Y-%m-%d")
TermDate<-as.Date(c("2018-02-01", NA, "2016-05-18", "2020-04-03", "2021-12-15", "2014-04-03"), "%Y-%m-%d")
df<-data.frame(ID=ID, StartDate=StartDate, TermDate=TermDate)

  ID  StartDate   TermDate
1  A 2018-01-01 2018-02-01
2  A 2019-02-05       <NA>
3  B 2016-04-18 2016-05-18
4  B 2020-03-03 2020-04-03
5  B 2021-12-13 2021-12-15
6  C 2014-03-03 2014-04-03

What I'm ultimately trying to get is the following:


  ID  StartDate   TermDate
1  A 2018-01-01       <NA>
2  B 2016-04-18 2021-12-15
3  C 2014-03-03 2014-04-03

like image 246
GM01 Avatar asked Feb 24 '21 09:02

GM01


People also ask

How do I combine data into a Dataframe in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

Can you combine data sets in R?

When you have multiple datasets that have the same set of columns, you can concatenate one dataset to another, vertically. That is, keeping the columns of your dataset, you can add more rows to it.

Can you group by date in R?

If you've got a series of dates and associated values, there's an extremely easy way to group them by date range such as week, month, quarter or year: R's cut() function. The as. Date() function is important here; otherwise R will view each item as a string object and not a date object.

How to combine year month and Day column in R data frame?

How to combine year, month, and day column in an R data frame? Sometimes date variable is recorded in three different columns representing year, month, and day instead of a single column as date. Therefore, we need to combine these three columns and create a single column. This can be done by using paste function and define the values with as.Date.

How do I combine two DataFrames in R?

We can create a DataFrame in R by combining two or more other DataFrames. We can do this horizontally or vertically. To combine DataFrames horizontally (i.e., adding the columns of one dataframe to the columns of the other), we use the cbind () function, where we pass the necessary DataFrames.

How to change column name in R Dataframe?

We can change column name in R with the function names (). Check the R create dataframe example below: By default, data frame returns string variables as a factor. It is possible to SLICE values of a Data Frame. We select the rows and columns to return into bracket precede by the name of the data frame.

What is a Dataframe in R?

DataFrames are essential data structures in the R programming language. In this tutorial, we’ll discuss how to create a dataframe in R. A DataFrame in R is a tabular (i.e., 2-dimensional, rectangular) data structure used to store values of any data type.


Video Answer


4 Answers

There are functions first and last in dplyr and data.table that could help here.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(StartDate = first(StartDate), 
            TermDate = last(TermDate))

#  ID    StartDate  TermDate  
#* <chr> <date>     <date>    
#1 A     2018-01-01 NA        
#2 B     2016-04-18 2021-12-15
#3 C     2014-03-03 2014-04-03

With data.table :

library(data.table)
setDT(df)[, .(StartDate = first(StartDate), TermDate = last(TermDate)), ID]
like image 68
Ronak Shah Avatar answered Nov 03 '22 00:11

Ronak Shah


Using min and max instead of first and last will eliminate the need for sorting the data, if not already

df %>% group_by(ID) %>%
  summarise(StartDate = min(StartDate),
         TermDate = max(TermDate))

# A tibble: 3 x 3
  ID    StartDate  TermDate  
* <chr> <date>     <date>    
1 A     2018-01-01 NA        
2 B     2016-04-18 2021-12-15
3 C     2014-03-03 2014-04-03

See if your df is like this

> df
  ID  StartDate   TermDate
1  A 2019-02-05       <NA>
2  A 2018-01-01 2018-02-01
3  B 2016-04-18 2016-05-18
4  B 2020-03-03 2020-04-03
5  B 2021-12-13 2021-12-15
6  C 2014-03-03 2014-04-03

df %>% group_by(ID) %>%
  summarise(StartDate = first(StartDate),
         TermDate = last(TermDate))

# A tibble: 3 x 3
  ID    StartDate  TermDate  
* <chr> <date>     <date>    
1 A     2019-02-05 2018-02-01
2 B     2016-04-18 2021-12-15
3 C     2014-03-03 2014-04-03
like image 22
AnilGoyal Avatar answered Nov 02 '22 23:11

AnilGoyal


We can also do

library(dplyr)
df %>%
  group_by(ID) %>%
  summarise(StartDate = StartDate[1]), 
            TermDate = TermDate[n()])
like image 44
akrun Avatar answered Nov 03 '22 00:11

akrun


Another data.table option

setDT(df)[
  ,
  as.list(
    setNames(
      data.frame(.SD)[cbind(c(1, .N), c(1, 2))],
      names(.SD)
    )
  ), ID
]

gives

   ID  StartDate   TermDate
1:  A 2018-01-01       <NA>
2:  B 2016-04-18 2021-12-15
3:  C 2014-03-03 2014-04-03
like image 24
ThomasIsCoding Avatar answered Nov 03 '22 00:11

ThomasIsCoding