Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use R for multiple select questions?

Tags:

r

I am trying to figure out how to analyze multiple select/multiple responses (i.e., 'select all that apply') questions in a survey I recently conducted.

SPSS has nice capabilities for analyzing online survey data and these types of questions so I am guessing that R has that and more. Dealing with these survey answers is a bit tricky in Excel. For example, show me a histogram/distribution everyone who likes strawberry and chocolate ice cream by age.

How do I structure the data set and what would be the commands to perform some basic tabulations of frequency, pareto, and logical AND OR functions?

like image 372
JHo Avatar asked Jul 24 '12 00:07

JHo


People also ask

How do you analyze multiple responses in SPSS?

After setting up a multiple response set, you will be able to access the Multiple Response Frequencies option through the menus. To do this, click Analyze > Multiple Response > Frequencies. All multiple response sets you've defined during the current SPSS session will appear on the left.

Which of the question types is a question that can be answered with multiple answers in a set of options *?

Multiple choice questions They allow your respondents to select one or more options from a list of answers that you define. They're intuitive, easy to use in different ways, help produce easy-to-analyze data, and provide mutually exclusive choices.


2 Answers

I've not found anything that is quite as convenient as the multiple response sets in SPSS. However, you can create groups relatively easily based on common column names, and then use any of the apply() function or friends to iterate through each group. Here's one approach using adply() from the plyr package:

library(plyr)
set.seed(1)
#Fake data with three "like" questions. 0 = non selected, 1 = selected
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE)
                  )

adply(dat[grepl("like", colnames(dat))], 2, function(x)
  data.frame(Count = as.data.frame(table(x))[2,2], 
        Perc = as.data.frame(prop.table(table(x)))[2,2]))
#-----
     X1 Count Perc
1 like1     6  0.6
2 like2     5  0.5
3 like3     3  0.3
like image 167
Chase Avatar answered Nov 09 '22 19:11

Chase


I recently wrote a quick function to deal with these. You can easily modify it to add proportion of total responses too.

set.seed(1)
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE))

The function:

multi.freq.table = function(data, sep="", dropzero=FALSE, clean=TRUE) {
  # Takes boolean multiple-response data and tabulates it according
  #   to the possible combinations of each variable.
  #
  # See: http://stackoverflow.com/q/11348391/1270695

  counts = data.frame(table(data))
  N = ncol(counts)
  counts$Combn = apply(counts[-N] == 1, 1, 
                       function(x) paste(names(counts[-N])[x],
                                         collapse=sep))
  if (isTRUE(dropzero)) {
    counts = counts[counts$Freq != 0, ]
  } else if (!isTRUE(dropzero)) {
    counts = counts
  }
  if (isTRUE(clean)) {
    counts = data.frame(Combn = counts$Combn, Freq = counts$Freq)
  } 
  counts
}

Apply the function:

multi.freq.table(dat[-1], sep="-")
#               Combn Freq
# 1                      1
# 2             like1    2
# 3             like2    2
# 4       like1-like2    2
# 5             like3    1
# 6       like1-like3    1
# 7       like2-like3    0
# 8 like1-like2-like3    1

Hope this helps! Otherwise, show some examples of desired output or describe some features, and I'll see what can be added.

Update

After looking at the output of SPSS for this online, it seems like the following should do it for you. This is easy enough to wrap into a function if you need to use it a lot.

data.frame(Freq = colSums(dat[-1]),
           Pct.of.Resp = (colSums(dat[-1])/sum(dat[-1]))*100,
           Pct.of.Cases = (colSums(dat[-1])/nrow(dat[-1]))*100)
#       Freq Pct.of.Resp Pct.of.Cases
# like1    6    42.85714           60
# like2    5    35.71429           50
# like3    3    21.42857           30
like image 24
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 09 '22 19:11

A5C1D2H2I1M1N2O1R2T1