Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Power Law in Excel works better than R?

Tags:

r

modeling

I am trying to model some data. I am having better luck with Excel than R, but the Excel solution won't scale so I need to figure how to do this in R.

Excel will map a trendline to the data and a Power curve yields a reasonable y = 0.6462x^-0.542.

When I put the same data into R and try to model it with the continuous power-law in the poweRlaw package I get something like y = 0.14901x^-3.03671. The intercept is way too small and the alpha is way too big.

# 14 days of % of users retained
y = c(0.61431   , 0.42585   , 0.35427   , 0.33893   , 0.28853   , 0.26004   , 0.2352    , 0.20087   , 0.17969   , 0.1848    , 0.17311   , 0.17092   , 0.15777   , 0.14901)

y.pl = conpl$new(y)
y.pl_est = estimate_xmin(c_pl)
y.pl_est

# $KS
# 0.1068587
#
# $xmin
# 0.14901
#
# $pars
# 3.03673
#
# $ntail
# 14

Image shows how the models built from 14-days of data with Excel and R conpl() match up to the Actuals out to day 60

Is there a way to use lm or glm to do a power curve that give reasonable intercept and alpha?

like image 770
MAlex Avatar asked Dec 08 '22 00:12

MAlex


2 Answers

Seems like Excel might be doing a linear model with normal errors on a log scale - I match the Excel results to as many decimal places as you share when I take logs of x and y before modeling.

Using @eipi10's nicely shared data frame:

dat = transform(dat, logx = log(x), logy = log(y))
mod = lm(logy ~ logx, data = dat)

## intercept
exp(coef(mod)[1])
# (Intercept) 
#   0.6461621 

## power
coef(mod)[2]
#       logx 
# -0.5424412 

This works, of course because if

      y = a * x ^ b
 log(y) = log(a) + b * log(x)

So the fitted coefficients of the linear model are log(a) and b in the power model.

The difference is the assumption of the error distribution. The other answer using NLS minimizes squared error on the power scale - which is the MLE if you assume normally distributed errors in y. This method (so apparently Excel's method as well) assumes that errors are normal on the log scale, which means assuming log-normal errors on the untransformed scale - which may very well be more appropriate. (Though from the graph in eipi's answer we can see the differences in fitted values are very small.)

like image 95
Gregor Thomas Avatar answered Jan 04 '23 17:01

Gregor Thomas


I haven't used the poweRlaw package, but R's base nls (non-linear least squares) function gives results similar to what you got with Excel. If there were a difference, after checking my code for errors, my first thought would be "so much the worse for Excel" :).

# Data
dat = data.frame(x=1:14,
y = c(0.61431   , 0.42585   , 0.35427   , 0.33893   , 0.28853   , 0.26004   , 0.2352    , 0.20087   , 0.17969   , 0.1848    , 0.17311   , 0.17092   , 0.15777   , 0.14901))

# Model
m1 = nls(y ~ a*x^b, list(a=1,b=1), data=dat)
summary(m1)

Formula: y ~ a * x^b

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
  a  0.62104    0.01307   47.51 4.94e-15 ***
  b -0.51460    0.01525  -33.74 2.92e-13 ***

# Plot nls model
curve(coef(m1)[1]*x^coef(m1)[2], from=1, to=14)

# Add curve for Excel model in red
curve(0.6462*x^(-0.542), from=1, to=14, col="red", lty=2, add=TRUE)

# Add data points
points(dat$x, dat$y)

enter image description here

like image 34
eipi10 Avatar answered Jan 04 '23 17:01

eipi10