Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gompertz Aging analysis in R

I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.

One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.

Exp      Day4   Day6    Day8    Day10   Day12   Day14    ...
A        0      0       0       2       3       1        ...

I make the example using this:

wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")

Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.

Exp     Deaths  Day     
A       0       4    
A       0       6
A       0       8
A       2       10
A       3       12
..      ..      ..

To make this example:

df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
    colnames(df2)<-c("Exp","Deaths","Day")

What I would like to do is perform a Gompertz Analysis (See second paragraph of "the life table" here). The equation is:

μx = α*e β*x

Where μx is probability of death at a given time, α is initial mortality rate, and β is the rate of aging.

I would like to be able to get a dataframe which has α and β estimates for each of my ~640 genotypes for further analysis later.

I need help going from the above dataframes to an output of these values for each of my genotypes in R.

I have looked through the package flexsurv which may house the answer but I have failed in attempts to find and implement it.

like image 218
rg255 Avatar asked Oct 21 '22 05:10

rg255


1 Answers

This should get you started...

Firstly, for the flexsurvreg function to work, you need to specify your input data as a Surv object (from package:survival). This means one row per observation.

The first thing is to re-create the 'raw' data from the summary tables you provide. (I know rbind is not efficient, but you can always switch to data.table for large sets).

### get rows with >1 death
df3 <- df2[df2$Deaths>1, 2:3]
### expand to give one row per death per time
df3 <- sapply(df3, FUN=function(x) rep(df3[, 2], df3[, 1]))
### each death is 1 (occurs once)
df3[, 1] <- 1
### add this to the rows with <=1 death
df3 <- rbind(df3, df2[!df2$Deaths>1, 2:3])
### convert to Surv object
library(survival)
s1 <- with(df3, Surv(Day, Deaths))
### get parameters for Gompertz distribution
library(flexsurv) 
f1 <- flexsurvreg(s1 ~ 1, dist="gompertz")

giving

> f1$res
              est         L95%        U95%
shape 0.165351912 0.1281016481 0.202602176
rate  0.001767956 0.0006902161 0.004528537

Note that this is an intercept-only model as all your genotypes are A. You can loop this over multiple survival objects once you have re-created the per-observation data as above.

From the flexsurv docs:

Gompertz distribution with shape parameter a and rate parameter b has hazard function

H(x: a, b) = b.e^{ax}

So it appears your alpha is b, the rate, and beta is a, the shape.

like image 143
dardisco Avatar answered Oct 27 '22 11:10

dardisco