Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left censoring for survival data in R

Tags:

I want to perform survival analysis (Kaplan-Meier and Cox PH modelling) on data which is both left and right censored. I'm looking at the time to occurrence of a heart arrhythmia (AF) in the presence versus the absence of a particular gene (Gene 0 or 1). However, some subjects are found to already have the arrhythmia at recruitment and so should be left censored. I've read the survival package documentation but can't work out how to account for the left censoring. Some made up example data below. Subjects 1 and 3 had AF at baseline and so should be left censored. Subject 2 did not experience the event by the end of follow-up and so is right censored. Subjects 5 and 6 both experienced the event (at 8 and 3 months respectively).

Gene<-c(0,0,1,1,0)
AF_at_baseline<-c(1,0,1,0, 0)
Followup_time<-c(11,3,8,15,7)
AF_time<-c(NA, NA, NA, 8, 3)
AF_data<-data.frame(Gene, AF_at_baseline, Followup_time, AF_time)
like image 335
BenC Avatar asked Jan 31 '17 22:01

BenC


People also ask

What is left censoring in survival analysis?

Left-censoring occurs when we cannot observe the time when the event occurred. For obvious reasons if the event is death, the data can't be left-censored. A good example is discussed in an ASA paper on survival analysis, “e.g. [a] study of age at which African children learn a task.

How do you deal with left censored data?

For each data set, five methods for handling left-censored data were applied: (i) substitution with LOD/√2, (ii) lognormal maximum likelihood estimation (MLE) to estimate mean and standard deviation, (iii) Kaplan-Meier estimation (KM), (iv) imputation method using MLE to estimate distribution parameters (MI method 1), ...

How do I enter survival data in R?

The R package named survival is used to carry out survival analysis. This package contains the function Surv() which takes the input data as a R formula and creates a survival object among the chosen variables for analysis. Then we use the function survfit() to create a plot for the analysis.

Why is censoring important in survival analysis?

Censored observations are subjects who either die of causes other than the disease of interest or are lost to follow-up. The aim of this paper is to show that the result of ignoring these 'censored observations' is an underestimation of the probability of survival beyond the fixed time-point.


1 Answers

I had a similar problem and solved it like this:

As it is stated in the survival help file you need to specify time and time2.

You can think of left censored data as going from -infinity until the time you measured, and of right censored of going from the time you measured (probably last follow-up) until +infinity. Infinity is best coded with NA.

What solved my problem was creating two vectors: a start vector time and a stop vector time2.

For time you want all those values that are left censored to be NA. Right censored observations are filled in with the time of measurement, just as the Events.

For time2 it is the other way around.

I don't really get your data however. Why would you follow-up on subjects if they already had the event? This is what you do for subject 4 and 5 by saying AF-time was 8 and 3 but Followup_time was 15 and 7.

Trying to help, I assume the following:

You have 5 patients with

AF_at_baseline<-c(1,0,1,0,0) #where 1 indicates left censoring

Follow-up times are event times (or last time of follow-up for left and right censored)

So for the left censored data your Followup_time would look like this:

Followup_time <- c(NA, 3, NA, 15, 7)

For the right censored data:

Followup_time2 <- c(11, NA, 8 ,15, 7)
#Since you indicated that only subject 2 didn't experience the event

Now you can call Surv

Surv.Obj <- Surv(Followup_time, Followup_time2, type = 'interval2')
Surv.Obj
[1] 11-  3+  8- 15   7 # with '-' indicating left censoring and '+' right censoring

Then you can call survfit and plot the Kaplan-Meier curve:

km <- survfit(Surv.Obj ~ 1, conf.type = "none")
km
Call: survfit(formula = Surv.Obj ~ 1, conf.type = "none")

      n  events  median 0.95LCL 0.95UCL 
      5       4       7       7      NA 
    enter code here

summary(km)
Call: survfit(formula = Surv.Obj ~ 1, conf.type = "none")

 time n.risk  n.event survival std.err lower 95% CI upper 95% CI
  7.0      4 3.00e+00     0.25   0.217       0.0458            1
  7.5      1 4.44e-16     0.25   0.217       0.0458            1
 15.0      1 1.00e+00     0.00     NaN           NA           NA


plot(km, conf.int = FALSE, mark.time = TRUE)

So far, I didn't find out how to do Cox PH with interval data. See my question here.

like image 198
Frederick Avatar answered Sep 25 '22 10:09

Frederick