Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mlogit.data in R - error in 'row.names<-.data.frame`(`*tmp*`, value = c("

Tags:

r

mlogit

I am trying to setup my data for the mlogit-package in R, but somehow seem to run into trouble.

My data-frame is called choice2, and it looks like this:

id choice_id mode.ids choice weightloss adveffect inj tab infreq_1 infreq_3 cost
1        x1        A      0        3.5         0   1   0        1        0  550
1        x1        B      0       10.0         1   0   1        0        1   90
1        x1        C      1        0.0         0   0   0        0        0    0
1       x10        A      0        6.0         0   1   0        0        1   50
1       x10        B      0        3.5         1   0   1        1        0  165
1       x10        C      1        0.0         0   0   0        0        0    0
1       x11        A      0        2.0         1   1   0        0        1  165
1       x11        B      1        3.5         0   0   1        1        0   90
1       x11        C      0        0.0         0   0   0        0        0    0
1       x12        A      0       10.0         1   1   0        0        1  550

I setup my data for the mlogit-package in R by running the following command:

require(mlogit)
CLOGIT <- mlogit.data(choice2,
                  choice = "choice",
                  shape = c("long"),
                  id.var = "id",
                  alt.var = "mode.ids",
                  varying = 5:11,
                  chid.var = "choice_id",
)

However, this results in the following error-message:

Error in `row.names<-.data.frame`(`*tmp*`, value = c("x1.A", "x1.B", "x1.C",  : 
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘x1.A’, ‘x1.B’, ‘x1.C’, ‘x10.A’, ‘x10.B’, ‘x10.C’, ‘x11.A’, ‘x11.B’, ‘x11.C’, ‘x12.A’, ‘x12.B’, ‘x12.C’, ‘x13.A’, ‘x13.B’, ‘x13.C’, ‘x2.A’, ‘x2.B’, ‘x2.C’, ‘x3.A’, ‘x3.B’, ‘x3.C’, ‘x4.A’, ‘x4.B’, ‘x4.C’, ‘x5.A’, ‘x5.B’, ‘x5.C’, ‘x6.A’, ‘x6.B’, ‘x6.C’, ‘x7.A’, ‘x7.B’, ‘x7.C’, ‘x8.A’, ‘x8.B’, ‘x8.C’, ‘x9.A’, ‘x9.B’, ‘x9.C’ 

Choice2 can be desciribed by the following:

> str(choice2)
'data.frame':   7722 obs. of  11 variables:
$ id        : int  1 1 1 1 1 1 1 1 1 1 ...
$ choice_id : Factor w/ 13 levels "x1","x10","x11",..: 1 1 1 2 2 2 3 3 3 4 ...
$ mode.ids  : Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3 1 2 3 1 ...
$ choice    : Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 2 1 1 ...
$ weightloss: num  3.5 10 0 6 3.5 0 2 3.5 0 10 ...
$ adveffect : int  0 1 0 0 1 0 1 0 0 1 ...
$ inj       : int  1 0 0 1 0 0 1 0 0 1 ...
$ tab       : int  0 1 0 0 1 0 0 1 0 0 ...
$ infreq_1  : int  1 0 0 0 1 0 0 1 0 0 ...
$ infreq_3  : int  0 1 0 1 0 0 1 0 0 1 ...
$ cost      : int  550 90 0 50 165 0 165 90 0 550 ...

Can anyone tell me what I might be doing wrong here? I have sought into the help-documentation of mlogit, and sought into similar topics here on stackowerflow without succes :)

All the best, Henrik

like image 521
Hbrandi Avatar asked Apr 08 '15 20:04

Hbrandi


1 Answers

It appears that your choice_id variable indexes the choice occasion for each respondent. However, that is not what the chid variable (technically a component of an attribute) in an mlogit.data object represents. The chid variable in an mlogit.data object represents choice occasions across the whole dataset. So if respondents 1 and 2 were presented with 13 choice tasks each, then the chid variable will be 1:26, rather than rep(1:13,2). That's why you're getting the non-unique row names error, because mlogit.data generates the row names as an interaction between the chid variable and the alternative variable.

But you don't need to worry about the chid variable, because mlogit.data will take care of it for you. Simply take out the chid.var argument in your call to mlogit.data, and you won't receive the error.

> require(mlogit)
> choice2 = data.frame(id = rep(1:2, each = 9),
+                      choice_id = rep(rep(1:3, each = 3), times = 2),
+                      mode.ids = rep(LETTERS[1:3], times = 6),
+                      choice = rep(c(0,0,1), times = 6),
+                      inj = runif(18) > 0.5)
> 
> # Causes error because chid.var is specified
> mlogit.data(choice2,
+             choice = 'choice',
+             shape = 'long',
+             id.var = 'id',
+             alt.var = 'mode.ids',
+             varying = 5,
+             chid.var = 'choice_id')
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1.A", "1.B", "1.C",  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1.A’, ‘1.B’, ‘1.C’, ‘2.A’, ‘2.B’, ‘2.C’, ‘3.A’, ‘3.B’, ‘3.C’ 
> 
> # Does not cause error because chid.var is not specified
> mlogit.data(choice2,
+             choice = 'choice',
+             shape = 'long',
+             id.var = 'id',
+             alt.var = 'mode.ids',
+             varying = 5)
    id choice_id mode.ids choice   inj
1.A  1         1        A  FALSE  TRUE
1.B  1         1        B  FALSE  TRUE
1.C  1         1        C   TRUE FALSE
2.A  1         2        A  FALSE FALSE
2.B  1         2        B  FALSE  TRUE
2.C  1         2        C   TRUE FALSE
3.A  1         3        A  FALSE FALSE
3.B  1         3        B  FALSE FALSE
3.C  1         3        C   TRUE  TRUE
4.A  2         1        A  FALSE  TRUE
4.B  2         1        B  FALSE FALSE
4.C  2         1        C   TRUE FALSE
5.A  2         2        A  FALSE FALSE
5.B  2         2        B  FALSE  TRUE
5.C  2         2        C   TRUE FALSE
6.A  2         3        A  FALSE  TRUE
6.B  2         3        B  FALSE FALSE
6.C  2         3        C   TRUE  TRUE
like image 82
Clara Avatar answered Oct 04 '22 21:10

Clara