Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extremely high probability of being alive BTYD R

Tags:

r

I am working on BTYD R package and the problem is that the values of the probability that a customer is alive at the end of calibration are extremely high. Even observations with only one transaction in calibration period have this probability around 0.9999. I know that the parameter "s" (estimated by the package) is used in this calculation. My gamma is very low (almost 0). When I tried to change it manually for higher value the probabilities went down. Any idea how to deal with this problem? I attach my codes below.

   elog <- dc.MergeTransactionsOnSameDate(elog)
    end.of.cal.period <- min(elog$date)+as.numeric((max(elog$date)-min(elog$date))/2)

data <- dc.ElogToCbsCbt(elog, per="week", 
                        T.cal=end.of.cal.period,
                        merge.same.date=TRUE, 
                        statistic = "freq") 

cal2.cbs <- as.matrix(data[[1]][[1]])

## prameters estimation
params2 <- pnbd.EstimateParameters(cal2.cbs)

## log likehood
(LL <- pnbd.cbs.LL(params2, cal2.cbs))

p.matrix <- c(params2, LL)
for (i in 1:20) {
  params2 <- pnbd.EstimateParameters(cal2.cbs, params2)
  LL <- pnbd.cbs.LL(params2, cal2.cbs)
  p.matrix.row <- c(params2, LL)
  p.matrix <- rbind(p.matrix, p.matrix.row)
}

(params2 <- p.matrix[dim(p.matrix)[1],1:4])

# set up parameter names for a more descriptive result
param.names <- c("r", "alpha", "s", "beta")

LL <- pnbd.cbs.LL(params2, cal2.cbs)

# PROBABILITY A CUSTOMER IS ALIVE AT END OF CALIBRATION / TRAINING
x <- cal2.cbs["123", "x"]         # x is frequency
t.x <- cal2.cbs["123", "t.x"]     # t.x is recency, ie time of last transactions
T.cal <- 26 # week of end of cal, i.e. present
pnbd.PAlive(params2, x, t.x, T.cal)
like image 422
Mila Avatar asked Jul 28 '15 11:07

Mila


1 Answers

There is no "gamma" parameter being estimated - "s" and "beta" define the gamma distribution of dropout rate heterogeneity. I recommend editing your post to include the parameters, as well as the output of

pnbd.PlotDropoutRateHeterogeneity(params2)

Without seeing your parameter estimates or knowing the context of your data, there are at least two (not mutually exclusive) possibilities.

First, you could have very low (e.g., zero) dropout rate. If so, you can still fit a plain NBD model of transaction rate, and assume a zero dropout rate.

Second, you could be seeing the "increasing frequency paradox". From pages 17-19 of one of Peter Fader/Bruce Hardie's papers:

For low frequency customers, there is an almost linear relationship between recency and [expected transactions]. However, this relationship becomes highly nonlinear for high frequency customers. In other words, for customers who have made a relatively large number of transactions in the past, recency plays a much bigger role in determining [value] than for an infrequent past purchaser.

According to the authors, a customer such as you describe with few (or even just a single) transaction receive a high probability of being "alive" with less dependency on recency. This is because by definition, a low frequency customer can have long "gaps" between purchases. Therefore we should assign less risk to a lower frequency customer even if they have not transacted for some time. Compare this to a high frequency customer - the longer we go without seeing a transaction, the faster we should could conclude that the customer is "dead" since we know they would ordinarily being making many transactions.

like image 124
Geoffrey Avatar answered Nov 15 '22 09:11

Geoffrey