I am trying to model some data. I am having better luck with Excel than R, but the Excel solution won't scale so I need to figure how to do this in R.
Excel will map a trendline to the data and a Power curve yields a reasonable y = 0.6462x^-0.542.
When I put the same data into R and try to model it with the continuous power-law in the poweRlaw
package I get something like y = 0.14901x^-3.03671
. The intercept is way too small and the alpha is way too big.
# 14 days of % of users retained
y = c(0.61431 , 0.42585 , 0.35427 , 0.33893 , 0.28853 , 0.26004 , 0.2352 , 0.20087 , 0.17969 , 0.1848 , 0.17311 , 0.17092 , 0.15777 , 0.14901)
y.pl = conpl$new(y)
y.pl_est = estimate_xmin(c_pl)
y.pl_est
# $KS
# 0.1068587
#
# $xmin
# 0.14901
#
# $pars
# 3.03673
#
# $ntail
# 14
Is there a way to use lm
or glm
to do a power curve that give reasonable intercept and alpha?
Seems like Excel might be doing a linear model with normal errors on a log scale - I match the Excel results to as many decimal places as you share when I take logs of x
and y
before modeling.
Using @eipi10's nicely shared data frame:
dat = transform(dat, logx = log(x), logy = log(y))
mod = lm(logy ~ logx, data = dat)
## intercept
exp(coef(mod)[1])
# (Intercept)
# 0.6461621
## power
coef(mod)[2]
# logx
# -0.5424412
This works, of course because if
y = a * x ^ b
log(y) = log(a) + b * log(x)
So the fitted coefficients of the linear model are log(a)
and b
in the power model.
The difference is the assumption of the error distribution. The other answer using NLS minimizes squared error on the power scale - which is the MLE if you assume normally distributed errors in y
. This method (so apparently Excel's method as well) assumes that errors are normal on the log scale, which means assuming log-normal errors on the untransformed scale - which may very well be more appropriate. (Though from the graph in eipi's answer we can see the differences in fitted values are very small.)
I haven't used the poweRlaw
package, but R's base nls
(non-linear least squares) function gives results similar to what you got with Excel. If there were a difference, after checking my code for errors, my first thought would be "so much the worse for Excel" :).
# Data
dat = data.frame(x=1:14,
y = c(0.61431 , 0.42585 , 0.35427 , 0.33893 , 0.28853 , 0.26004 , 0.2352 , 0.20087 , 0.17969 , 0.1848 , 0.17311 , 0.17092 , 0.15777 , 0.14901))
# Model
m1 = nls(y ~ a*x^b, list(a=1,b=1), data=dat)
summary(m1)
Formula: y ~ a * x^b
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 0.62104 0.01307 47.51 4.94e-15 ***
b -0.51460 0.01525 -33.74 2.92e-13 ***
# Plot nls model
curve(coef(m1)[1]*x^coef(m1)[2], from=1, to=14)
# Add curve for Excel model in red
curve(0.6462*x^(-0.542), from=1, to=14, col="red", lty=2, add=TRUE)
# Add data points
points(dat$x, dat$y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With