Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multinomial logit: estimation on a subset of alternatives in R

Tags:

r

subset

mlogit

As McFadden (1978) showed, if the number of alternatives in a multinomial logit model is so large that computation becomes impossible, it is still feasible to obtain consistent estimates by randomly subsetting the alternatives, so that the estimated probabilities for each individual are based on the chosen alternative and C other randomly selected alternatives. In this case, the size of the subset of alternatives is C+1 for each individual.

My question is about the implementation of this algorithm in R. Is it already embedded in any multinomial logit package? If not - which seems likely based on what I know so far - how would one go about including the procedure in pre-existing packages without recoding extensively?

like image 201
Effa Avatar asked Jan 27 '23 17:01

Effa


1 Answers

Not sure whether the question is more about doing the sampling of alternatives or the estimation of MNL models after sampling of alternatives. To my knowledge, there are no R packages that do sampling of alternatives (the former) so far, but the latter is possible with existing packages such as mlogit. I believe the reason is that the sampling process varies depending on how your data is organized, but it is relatively easy to do with a bit of your own code. Below is code adapted from what I used for this paper.

library(tidyverse)
# create artificial data
set.seed(6)
# data frame of choser id and chosen alt_id
id_alt <- data.frame(
  id = 1:1000,
  alt_chosen = sample(1:30, 1)
)
# data frame for universal choice set, with an alt-specific attributes (alt_x2)
alts <- data.frame(
  alt_id = 1:30,
  alt_x2 = runif(30)
)

# conduct sampling of 9 non-chosen alternatives
id_alt <- id_alt %>% 
  mutate(.alts_all =list(alts$alt_id),
         # use weights to avoid including chosen alternative in sample
         .alts_wtg = map2(.alts_all, alt_chosen, ~ifelse(.x==.y, 0, 1)),
         .alts_nonch = map2(.alts_all, .alts_wtg, ~sample(.x, size=9, prob=.y)),
         # combine chosen & sampled non-chosen alts
         alt_id = map2(alt_chosen, .alts_nonch, c)
  ) 

# unnest above data.frame to create a long format data frame
# with rows varying by choser id and alt_id
id_alt_lf <- id_alt %>% 
  select(-starts_with(".")) %>%
  unnest(alt_id)

# join long format df with alts to get alt-specific attributes
id_alt_lf <- id_alt_lf %>% 
  left_join(alts, by="alt_id") %>% 
  mutate(chosen=ifelse(alt_chosen==alt_id, 1, 0))

require(mlogit)
# convert to mlogit data frame before estimating
id_alt_mldf <- mlogit.data(id_alt_lf, 
                           choice="chosen", 
                           chid.var="id", 
                           alt.var="alt_id", shape="long")
mlogit( chosen ~ 0 + alt_x2, id_alt_mldf) %>% 
  summary()

It is, of course, possible without using the purrr::map functions, by using apply variants or looping through each row of id_alt.

like image 143
LmW. Avatar answered Feb 03 '23 06:02

LmW.