How to use R for multiple select questions?

Tags:

r

I am trying to figure out how to analyze multiple select/multiple responses (i.e., 'select all that apply') questions in a survey I recently conducted.

SPSS has nice capabilities for analyzing online survey data and these types of questions so I am guessing that R has that and more. Dealing with these survey answers is a bit tricky in Excel. For example, show me a histogram/distribution everyone who likes strawberry and chocolate ice cream by age.

How do I structure the data set and what would be the commands to perform some basic tabulations of frequency, pareto, and logical AND OR functions?

372

asked Jul 24 '12 00:07

JHo

2 Answers

I've not found anything that is quite as convenient as the multiple response sets in SPSS. However, you can create groups relatively easily based on common column names, and then use any of the apply() function or friends to iterate through each group. Here's one approach using adply() from the plyr package:

library(plyr)
set.seed(1)
#Fake data with three "like" questions. 0 = non selected, 1 = selected
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE)
                  )

adply(dat[grepl("like", colnames(dat))], 2, function(x)
  data.frame(Count = as.data.frame(table(x))[2,2], 
        Perc = as.data.frame(prop.table(table(x)))[2,2]))
#-----
     X1 Count Perc
1 like1     6  0.6
2 like2     5  0.5
3 like3     3  0.3

167

answered Nov 09 '22 19:11

Chase

I recently wrote a quick function to deal with these. You can easily modify it to add proportion of total responses too.

set.seed(1)
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE))

The function:

multi.freq.table = function(data, sep="", dropzero=FALSE, clean=TRUE) {
  # Takes boolean multiple-response data and tabulates it according
  #   to the possible combinations of each variable.
  #
  # See: http://stackoverflow.com/q/11348391/1270695

  counts = data.frame(table(data))
  N = ncol(counts)
  counts$Combn = apply(counts[-N] == 1, 1, 
                       function(x) paste(names(counts[-N])[x],
                                         collapse=sep))
  if (isTRUE(dropzero)) {
    counts = counts[counts$Freq != 0, ]
  } else if (!isTRUE(dropzero)) {
    counts = counts
  }
  if (isTRUE(clean)) {
    counts = data.frame(Combn = counts$Combn, Freq = counts$Freq)
  } 
  counts
}

Apply the function:

multi.freq.table(dat[-1], sep="-")
#               Combn Freq
# 1                      1
# 2             like1    2
# 3             like2    2
# 4       like1-like2    2
# 5             like3    1
# 6       like1-like3    1
# 7       like2-like3    0
# 8 like1-like2-like3    1

Hope this helps! Otherwise, show some examples of desired output or describe some features, and I'll see what can be added.

Update

After looking at the output of SPSS for this online, it seems like the following should do it for you. This is easy enough to wrap into a function if you need to use it a lot.

data.frame(Freq = colSums(dat[-1]),
           Pct.of.Resp = (colSums(dat[-1])/sum(dat[-1]))*100,
           Pct.of.Cases = (colSums(dat[-1])/nrow(dat[-1]))*100)
#       Freq Pct.of.Resp Pct.of.Cases
# like1    6    42.85714           60
# like2    5    35.71429           50
# like3    3    21.42857           30

answered Nov 09 '22 19:11

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Running out of memory with merge
                            
                                read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"
                            
                                Negative subscripts error in R
                            
                                Cox regression output in xtable - choosing rows/columns and adding a confidence interval
                            
                                Selecting a non-contiguous submatrix in Rcpp
                            
                                How do I extract lmer fixed effects by observation?
                            
                                Milliseconds puzzle when calling strptime in R
                            
                                How to extract data from a text file using R or PowerShell?
                            
                                Get filename from read.csv(file.choose( ))
                            
                                Warning message in mixed model lme4
                            
                                sort a data frame manually using non numeric column
                            
                                Faceted time series with mean profile in ggplot2
                            
                                Why this is so slow? (loop in a DF row vs. a standalone vector)
                            
                                R - preserve order when using matching operators (%in%)
                            
                                How do I formulate a for in loop in R where I may want to loop zero times?
                            
                                R: Format output of write.table
                            
                                Attribute variable name to a named vector
                            
                                Using layout with knitr
                            
                                Split data.frame by value
                            
                                ggplot geom_bar - 'rotate and flip'?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With