I have a coxph model with 5 time-dependent and 2 time-independent variables. I want to test the proportional hazards assumption and besides martingale and deviance residuals, using cox.zph. My question is, how does this function deal with time-dependent covariates?
After reading Grant et al.,2014, I am not sure if this is the recommended goodness-of-fit test to assess the PH assumption for time-varying covariates.
Model:
teste<-coxph(Surv(tempo1,tempo2,status)~sexo+CODE_06+factor(clima)+TP_media7
+ndvi+peso+epoca,data=newftable,na.action=na.fail)
> cox.zph(teste)
rho chisq p
sexoM 0.0844 0.32363 0.5694
CODE_06Regadio 0.1531 0.66865 0.4135
CODE_06Sequeiro 0.2278 1.65735 0.1980
factor(clima)8 -0.1823 1.16522 0.2804
factor(clima)9 0.1051 0.24456 0.6209
factor(clima)15 -0.0193 0.00945 0.9226
TP_media7(12,22] 0.1689 0.75604 0.3846
TP_media7(22,32] 0.1797 1.03731 0.3084
TP_media7(32,41] 0.1060 0.34036 0.5596
ndvi(3e+03,4e+03] -0.1595 1.00006 0.3173
ndvi(4e+03,5e+03] 0.0421 0.05233 0.8191
ndvi(5e+03,6e+03] 0.1750 0.98816 0.3202
ndvi(6e+03,8.05e+03] -0.0311 0.02880 0.8653
peso[850,1005] 0.2534 3.34964 0.0672
epocamid_inv_rep 0.0193 0.01219 0.9121
epocamid_pos_inv -0.2193 0.93355 0.3339
epocamid_rep_pos 0.0231 0.01341 0.9078
epocapos_repr 0.2073 1.09893 0.2945
epocarepr 0.0766 0.12905 0.7194
GLOBAL NA 19.79229 0.4072
The cox. zph function will test proportionality of all the predictors in the model by creating interactions with time using the transformation of time specified in the transform option.
A time-varying covariate (also called time-dependent covariate) is a term used in statistics, particularly in survival analyses. It reflects the phenomenon that a covariate is not necessarily constant through the whole study.
The Cox proportional-hazards regression model for time-to-event data may be used with covariates, independent variables, or predictor variables that vary over time. These are called time-dependent covariates. Their use is much more complicated in practice than the fixed (time-independent) covariates.
The first step in analyzing time-varying covariates in survival analysis is to reshape the data frame so that there are multiple rows (time intervals) for each subject, along with covariate values that apply across these intervals. Such a format is also known as the counting process style or (start, stop) form of data.
As I understand it cox.zph
is a test as to whether a covariate should enter the model as independent of time. If you already know that your predictor is time-dependent then this does not seem to be the appropriate approach. I'm not aware of an easy way to go about this and such a question may find a more receptive audience on Cross Validated.
For a reproducible example, we can use that from Therneau:
library(survival)
veteran$celltype <- relevel(veteran$celltype, ref="adeno")
f1 <- coxph(Surv(time, status) ~
trt + celltype + karno + diagtime + age + prior,
data=veteran)
(z1 <- cox.zph(f1, transform="log"))
rho chisq p
trt -0.01561 0.0400 0.841486
celltypesquamous -0.16278 3.8950 0.048431
celltypesmallcell -0.11908 2.2199 0.136238
celltypelarge 0.00942 0.0121 0.912551
karno 0.29329 11.8848 0.000566
diagtime 0.11317 1.6951 0.192930
age 0.20984 6.5917 0.010245
prior -0.16683 3.9873 0.045844
GLOBAL NA 27.5319 0.000572
rho is Pearson's correlation between the scaled Shoenfeld residuals and g(t) where g is a function of time (default is the Kaplan-Meier scale; here we are using log
, as you can see on the scale of the x axis in the plot below).
If the variable is time-invariant then the slope of the plotted line should be zero. This is essentially what chisq tests.
Update @Didi Ingabire - in light of your comments:
Thus a low p-value indicates:
coxph
model) may be violated by this variableYou can see this visually like so:
for (i in 1:(nrow(z1$table)-1)){
plot(z1[i], main="Scaled Schoenfeld residuals by time with smooth spline
If <0 indicates protective effect")
graphics::abline(a=0, b=0, col="black")
}
which gives e.g.:
Update @JMarcelino This is to say that cox.zph
is a test of the final form of the model, to ensure that the residuals are relatively constant over time.
If one of the variables is already a function of time (when it enters the model), this won't affect the test. In fact it should be more likely to produce a flat line with a high p-value if the influence of time is modeled correctly.
Also, testing proportional hazards means testing is the hazard ratio constant over time?. Whether the variable is time-dependent or not (when it enters the model) is unimportant. What is being tested is the final form of the model.
For example, instead of karno
we can enter a variable which is related to both it and to time like so:
f2 <- coxph(Surv(time, status) ~
trt + celltype + log(karno * time) + diagtime + age + prior,
data=veteran)
(z2 <- cox.zph(f2, transform="log"))
rho chisq p
trt 0.0947 1.4639 0.226
celltypesquamous -0.0819 1.1085 0.292
celltypesmallcell -0.0897 1.3229 0.250
celltypelarge 0.0247 0.0968 0.756
log(karno * time) -0.0836 0.6347 0.426
diagtime 0.0463 0.2723 0.602
age 0.0532 0.3493 0.554
prior -0.0542 0.3802 0.538
GLOBAL NA 7.6465 0.469
This gives us a model which better fits the proportional hazards assumption. However the interpretation of the coefficient log(karno * time)
is not particularly intuitive and unlikely to be of great practical value.
Its important to distinguish between time-dependent variable and a variable that does not meet the PH assumption.
A time-dependent variable is one that vary with time. This could be blood pressure; it will vary on different occasions. Sex (gender) will however not vary on different occasions.
Then there is a distinction between internal and external time-dependent variables:
• Internal time-dependent variables: are variables that vary because of changes within the individual (e.g blood pressure).
• External time-dependent variables: environmental/external changes that modify the hazard experienced by an individual (e.g as industries proliferate in a city, air pollution increases with time and so the hazard in the population increases for conditions such as myocardial infarction).
Regardless of the nature of a variable, fixed or time-dependent, it can violate the PH assumption. I could provide a few examples, but it is probably easier to just accept the fact that any variable might violate PH assumption. It can even if You try to accomodate the variations in an extended Cox model (e.g using multiple observations per individual in counting-process format).
The solution: you can enter any predictor into a Cox model and check if it fulfills the PH by the cox.zph function of Thernau's survival package. The corresponding statement in SAS would be the 'assess ph / resample' statement. If a variable violates PH, the (probably) simplest way to solve this is to introduce an interaction between that variable and your time variable.
Example follows. This is the Cox formula:
Survival = age + sex + blood_pressure
Lets say blood pressure violates PH --> Introduce the following term:
Survival = age + sex + blood_pressure*survival_time_variable
This should solve it but you cannot interpret the main effect of blood-pressure because that variable now depends om time.
Another solution is to stratify your model, but this would not be appropriate for a continuous variable and for categorical variables, once stratified, the variable is not included as a covariate in the resulting model (ie you wont be getting a hazard ratio).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With