Create frequency tables for multiple factor columns in R

Tags:

I am a novice in R. I am compiling a separate manual on the syntax for the common functions/features for my work. My sample dataframe as follows:

x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L, 
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all", 
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L, 
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral", 
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"), 
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt", 
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B", 
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")

> x.sample
          Q9_A            Q9_B            Q9_C
1        Very Impt       Very Impt Not Impt at all
2  Not Impt at all       Very Impt       Very Impt
3        Very Impt       Very Impt       Very Impt
4  Not Impt at all Not Impt at all Not Impt at all
5        Very Impt       Very Impt       Very Impt
6  Not Impt at all       Very Impt       Very Impt
7             Impt Not Impt at all Not Impt at all
8        Very Impt       Very Impt       Very Impt
9        Very Impt Not Impt at all       Very Impt
10       Very Impt Not Impt at all Not Impt at all

My original dataframe has 21 columns.

If I want to find the mean (treating this as an ordinal variable):

> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C 
 4.0  4.2  4.2

I would like to tabulate a frequency table for ALL the variables in my dataframe. I searched the internet and many forums and saw that the nearest command to do this is using sapply. But when I did it, it gave all 0s.

> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt",            "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
                Q9_A Q9_B Q9_C
Not Impt at all    0    0    0
Somewhat Impt      0    0    0
Neutral            0    0    0
Impt               0    0    0
Very Impt          0    0    0

QUESTION How can I make use of sapply to tabulate a frequency chart as per the above table for all the columns (that are factors) in a dataframe?

PS So sorry if this seems trivia but I have searched for 2 days without an answer and trying all possible combinations. Maybe I didn't search hard enough =(

Thanks very much.

980

asked Oct 10 '14 03:10

Raphael Lee

2 Answers

You were nearly there. Just one small change in your function would have got you there. The x in function(x) ... needs to be passed through to the table() call:

levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))

A little re-jig of the code might make it a bit easier to read too:

sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)

#                Q9_A Q9_B Q9_C
#Not Impt at all    3    4    4
#Somewhat Impt      0    0    0
#Neutral            0    0    0
#Impt               1    0    0
#Very Impt          6    6    6

162

answered Oct 02 '22 01:10

thelatemail

Coming a bit late, but here's a reshape2 possible solution. It could have been very straightforward with recast but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE within melt and drop = FALSE within dcast, while recast can't pass arguments to melt (only to dcast), so here goes

library(reshape2)
x.sample$indx <- 1 
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2         Neutral    0    0    0
# 3 Not Impt at all    3    4    4
# 4   Somewhat Impt    0    0    0
# 5       Very Impt    6    6    6

If we wouldn't care about empty levels a quick solution would be just

recast(x.sample, value ~ variable, id.var = "indx")
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2 Not Impt at all    3    4    4
# 3       Very Impt    6    6    6

Alternatively, if speed is a concern, we can do the same using data.atble

library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE), 
           value ~ variable, drop = FALSE)
#              value Q9_A Q9_B Q9_C
# 1:            Impt    1    0    0
# 2:         Neutral    0    0    0
# 3: Not Impt at all    3    4    4
# 4:   Somewhat Impt    0    0    0
# 5:       Very Impt    6    6    6

answered Oct 02 '22 01:10

David Arenburg

Related questions
                            
                                Rounding output from by function in R
                            
                                Random number analysis
                            
                                Array: subtract by row
                            
                                RODBC fails: "invalid character value for cast specification" - Excel 2007
                            
                                How to annotate across or between plots in multi-plot panels in R
                            
                                List and description of all packages in CRAN from within R
                            
                                drop = TRUE doesn't drop factor levels in data.frame while in vector it does
                            
                                Escaping backslash (\) in string or paths in R
                            
                                adding percentile lines to a density plot [duplicate]
                            
                                Use max on each element of a matrix
                            
                                R nls singular gradient
                            
                                R string removes punctuation on split
                            
                                Row product of matrix and column sum of matrix
                            
                                R load script objects to workspace
                            
                                Producing an animated comet plot in R
                            
                                Ordering Permutation in Rcpp i.e. base::order()
                            
                                Print r vector to copy paste into other code. [duplicate]
                            
                                Binning data in R
                            
                                What does mfrow & mfcol stand for in par()?
                            
                                How to create mean and s.d. columns in data.table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create frequency tables for multiple factor columns in R

Tags:

r

sapply

r-factor

Raphael Lee

People also ask

2 Answers

thelatemail

David Arenburg

Recent Activity

Donate For Us