I have binned data I'm trying to perform a survival analysis on, example data below. n is a count of units at each group, time, failure indicator combination.
> df <- structure(list(group = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "A", "B"), class = "factor"), t = c(0L, 1L, 2L, 3L, 1L, 2L, 3L, 0L, 1L, 2L, 3L, 1L, 2L, 3L), failure = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), n = c(40000L, 30000L, 20000L, 10000L, 5L, 4L, 3L, 20000L, 15000L, 14000L, 11000L, 10L, 6L, 4L)), .Names = c("group", "t", "failure", "n"), row.names = c(NA, 14L), class = "data.frame")
> df
   group t failure     n
1      A 0       0 40000
2      A 1       0 30000
3      A 2       0 20000
4      A 3       0 10000
5      A 1       1     5
6      A 2       1     4
7      A 3       1     3
8      B 0       0 20000
9      B 1       0 15000
10     B 2       0 14000
11     B 3       0 11000
12     B 1       1    10
13     B 2       1     6
14     B 3       1     4
I know I can rep df by the n column so each row is one unit:
(ref. How do I create a survival object in R?)
> library(survival)
> df2 <- df[rep(rownames(df),df$n),]
> sfit <- survfit(Surv(t,failure)~group, data = df2)
However, my actual data has about 10 million units. Is there a way to do survival with a count/frequency variable to avoid creating a 10 million row data frame?
You'll want to use the weights parameter. You can compare the the two approaches to confirm that you have the same output.
With your data that you repeated:
sfit <- survfit(Surv(t,failure)~group, data = df2)
summary(sfit)
Call: survfit(formula = Surv(t, failure) ~ group, data = df2)
                group=A 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
    1  60012       5    1.000 3.73e-05        1.000            1
    2  30007       4    1.000 7.63e-05        1.000            1
    3  10003       3    0.999 1.89e-04        0.999            1
                group=B 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
    1  40020      10    1.000 0.000079        1.000            1
    2  25010       6    1.000 0.000126        0.999            1
    3  11004       4    0.999 0.000221        0.999            1
Now using weights:
weights <- df$n
sfit2 <- survfit(Surv(t,failure)~group, data = df, weights = weights)
summary(sfit2)
Call: survfit(formula = Surv(t, failure) ~ group, data = df, weights = weights)
                group=A 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
    1  60012       5    1.000 3.73e-05        1.000            1
    2  30007       4    1.000 7.63e-05        1.000            1
    3  10003       3    0.999 1.89e-04        0.999            1
                group=B 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
    1  40020      10    1.000 0.000079        1.000            1
    2  25010       6    1.000 0.000126        0.999            1
    3  11004       4    0.999 0.000221        0.999            1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With