I want to perform survival analysis (Kaplan-Meier and Cox PH modelling) on data which is both left and right censored. I'm looking at the time to occurrence of a heart arrhythmia (AF) in the presence versus the absence of a particular gene (Gene 0 or 1). However, some subjects are found to already have the arrhythmia at recruitment and so should be left censored. I've read the survival package documentation but can't work out how to account for the left censoring. Some made up example data below. Subjects 1 and 3 had AF at baseline and so should be left censored. Subject 2 did not experience the event by the end of follow-up and so is right censored. Subjects 5 and 6 both experienced the event (at 8 and 3 months respectively).
Gene<-c(0,0,1,1,0)
AF_at_baseline<-c(1,0,1,0, 0)
Followup_time<-c(11,3,8,15,7)
AF_time<-c(NA, NA, NA, 8, 3)
AF_data<-data.frame(Gene, AF_at_baseline, Followup_time, AF_time)
Left-censoring occurs when we cannot observe the time when the event occurred. For obvious reasons if the event is death, the data can't be left-censored. A good example is discussed in an ASA paper on survival analysis, “e.g. [a] study of age at which African children learn a task.
For each data set, five methods for handling left-censored data were applied: (i) substitution with LOD/√2, (ii) lognormal maximum likelihood estimation (MLE) to estimate mean and standard deviation, (iii) Kaplan-Meier estimation (KM), (iv) imputation method using MLE to estimate distribution parameters (MI method 1), ...
The R package named survival is used to carry out survival analysis. This package contains the function Surv() which takes the input data as a R formula and creates a survival object among the chosen variables for analysis. Then we use the function survfit() to create a plot for the analysis.
Censored observations are subjects who either die of causes other than the disease of interest or are lost to follow-up. The aim of this paper is to show that the result of ignoring these 'censored observations' is an underestimation of the probability of survival beyond the fixed time-point.
I had a similar problem and solved it like this:
As it is stated in the survival
help file you need to specify time
and time2
.
You can think of left censored data as going from -infinity
until the time
you measured, and of right censored of going from the time
you measured (probably last follow-up) until +infinity
. Infinity is best coded with NA
.
What solved my problem was creating two vectors: a start vector time
and a stop vector time2
.
For time
you want all those values that are left censored to be NA
. Right censored observations are filled in with the time of measurement, just as the Events.
For time2
it is the other way around.
I don't really get your data however. Why would you follow-up on subjects if they already had the event? This is what you do for subject 4 and 5 by saying AF-time was 8 and 3 but Followup_time was 15 and 7.
Trying to help, I assume the following:
You have 5 patients with
AF_at_baseline<-c(1,0,1,0,0) #where 1 indicates left censoring
Follow-up times are event times (or last time of follow-up for left and right censored)
So for the left censored data your Followup_time would look like this:
Followup_time <- c(NA, 3, NA, 15, 7)
For the right censored data:
Followup_time2 <- c(11, NA, 8 ,15, 7)
#Since you indicated that only subject 2 didn't experience the event
Now you can call Surv
Surv.Obj <- Surv(Followup_time, Followup_time2, type = 'interval2')
Surv.Obj
[1] 11- 3+ 8- 15 7 # with '-' indicating left censoring and '+' right censoring
Then you can call survfit
and plot the Kaplan-Meier curve:
km <- survfit(Surv.Obj ~ 1, conf.type = "none")
km
Call: survfit(formula = Surv.Obj ~ 1, conf.type = "none")
n events median 0.95LCL 0.95UCL
5 4 7 7 NA
enter code here
summary(km)
Call: survfit(formula = Surv.Obj ~ 1, conf.type = "none")
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7.0 4 3.00e+00 0.25 0.217 0.0458 1
7.5 1 4.44e-16 0.25 0.217 0.0458 1
15.0 1 1.00e+00 0.00 NaN NA NA
plot(km, conf.int = FALSE, mark.time = TRUE)
So far, I didn't find out how to do Cox PH with interval data. See my question here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With