I have a data frame similar to this:
df<-read.csv(text="id;census;startDate;endDate
ZF001;died;16.10.2012;16.05.2015
ZF002;alive;20.10.2013
ZF003;alive;04.11.2013;
ZF004;died;11.11.2013;20.12.2014
ZF005;died;25.11.2013;16.06.2015
ZF006;alive;25.11.2014;
ZF007;survived;02.12.2014;19.01.2015
ZF008;alive;11.12.2014;
ZF009;survived;28.01.2015;12.03.2015", sep=";")
df$startDate<-as.Date(df$startDate, "%d.%m.%Y")
df$endDate<-as.Date(df$endDate, "%d.%m.%Y")
What I need is the following: A new data frame containing how many days per year the proband was part of the study. It should look similar to this:
id year days
ZF001 2012 77
ZF001 2013 365
ZF001 2014 365
ZF001 2015 135
etc.
I'm assuming you want this only for the dead probands (as the live ones don't have an end date), here's a possible data.table
solution which is pretty much self explanatory
library(data.table)
setDT(df)[census == "died",
as.data.table(table(year(seq.Date(startDate, endDate, by = "day")))),
by = id]
# id V1 N
# 1: ZF001 2012 77
# 2: ZF001 2013 365
# 3: ZF001 2014 365
# 4: ZF001 2015 136
# 5: ZF004 2013 51
# 6: ZF004 2014 354
# 7: ZF005 2013 37
# 8: ZF005 2014 365
# 9: ZF005 2015 167
Basically we calculate all the days from the start to the end date per id
, then, we are using the year
function in order to extract years, and then just calculating frequencies
Or an equivalent dplyr
solution
library(dplyr)
df %>%
group_by(id) %>%
filter(census=='died') %>%
do(as.data.frame(table(year(seq.Date(.$startDate, .$endDate, by ='day')))))
Edit per comments:
If you want this for all the patients (dead or alive), while for the alive ones you want to use Sys.Date
, we could define a simple helper function in that case
dateFunc <- function(x, y){
if(is.na(y)) {
as.data.table(table(year(seq.Date(x, Sys.Date(), by = "day"))))
} else as.data.table(table(year(seq.Date(x, y, by = "day"))))
}
setDT(df)[, setNames(dateFunc(startDate, endDate), c("Year", "Days")), by = id]
# id Year Days
# 1: ZF001 2012 77
# 2: ZF001 2013 365
# 3: ZF001 2014 365
# 4: ZF001 2015 136
# 5: ZF002 2013 73
# 6: ZF002 2014 365
# 7: ZF002 2015 222
# 8: ZF003 2013 58
# 9: ZF003 2014 365
# 10: ZF003 2015 222
# 11: ZF004 2013 51
# 12: ZF004 2014 354
# 13: ZF005 2013 37
# 14: ZF005 2014 365
# 15: ZF005 2015 167
# 16: ZF006 2014 37
# 17: ZF006 2015 222
# 18: ZF007 2014 30
# 19: ZF007 2015 19
# 20: ZF008 2014 21
# 21: ZF008 2015 222
# 22: ZF009 2015 44
Data
df <- structure(list(id = structure(1:9, .Label = c("ZF001", "ZF002",
"ZF003", "ZF004", "ZF005", "ZF006", "ZF007", "ZF008", "ZF009"
), class = "factor"), census = structure(c(2L, 1L, 1L, 2L, 2L,
1L, 3L, 1L, 3L), .Label = c("alive", "died", "survived"), class = "factor"),
startDate = structure(c(15629, 15998, 16013, 16020, 16034,
16399, 16406, 16415, 16463), class = "Date"), endDate = structure(c(16571,
NA, NA, 16424, 16602, NA, 16454, NA, 16506), class = "Date")), .Names = c("id",
"census", "startDate", "endDate"), row.names = c(NA, -9L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With