Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create frequency tables for multiple factor columns in R

Tags:

r

sapply

r-factor

I am a novice in R. I am compiling a separate manual on the syntax for the common functions/features for my work. My sample dataframe as follows:

x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L, 
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all", 
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L, 
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral", 
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"), 
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt", 
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B", 
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")

> x.sample
          Q9_A            Q9_B            Q9_C
1        Very Impt       Very Impt Not Impt at all
2  Not Impt at all       Very Impt       Very Impt
3        Very Impt       Very Impt       Very Impt
4  Not Impt at all Not Impt at all Not Impt at all
5        Very Impt       Very Impt       Very Impt
6  Not Impt at all       Very Impt       Very Impt
7             Impt Not Impt at all Not Impt at all
8        Very Impt       Very Impt       Very Impt
9        Very Impt Not Impt at all       Very Impt
10       Very Impt Not Impt at all Not Impt at all

My original dataframe has 21 columns.

If I want to find the mean (treating this as an ordinal variable):

> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C 
 4.0  4.2  4.2

I would like to tabulate a frequency table for ALL the variables in my dataframe. I searched the internet and many forums and saw that the nearest command to do this is using sapply. But when I did it, it gave all 0s.

> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt",            "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
                Q9_A Q9_B Q9_C
Not Impt at all    0    0    0
Somewhat Impt      0    0    0
Neutral            0    0    0
Impt               0    0    0
Very Impt          0    0    0

QUESTION How can I make use of sapply to tabulate a frequency chart as per the above table for all the columns (that are factors) in a dataframe?

PS So sorry if this seems trivia but I have searched for 2 days without an answer and trying all possible combinations. Maybe I didn't search hard enough =(

Thanks very much.

like image 980
Raphael Lee Avatar asked Oct 10 '14 03:10

Raphael Lee


People also ask

How do I create a frequency table for categorical data in R?

To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.

How do I convert multiple columns to factor in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.


2 Answers

You were nearly there. Just one small change in your function would have got you there. The x in function(x) ... needs to be passed through to the table() call:

levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))

A little re-jig of the code might make it a bit easier to read too:

sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)

#                Q9_A Q9_B Q9_C
#Not Impt at all    3    4    4
#Somewhat Impt      0    0    0
#Neutral            0    0    0
#Impt               1    0    0
#Very Impt          6    6    6
like image 162
thelatemail Avatar answered Oct 02 '22 01:10

thelatemail


Coming a bit late, but here's a reshape2 possible solution. It could have been very straightforward with recast but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE within melt and drop = FALSE within dcast, while recast can't pass arguments to melt (only to dcast), so here goes

library(reshape2)
x.sample$indx <- 1 
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2         Neutral    0    0    0
# 3 Not Impt at all    3    4    4
# 4   Somewhat Impt    0    0    0
# 5       Very Impt    6    6    6

If we wouldn't care about empty levels a quick solution would be just

recast(x.sample, value ~ variable, id.var = "indx")
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2 Not Impt at all    3    4    4
# 3       Very Impt    6    6    6

Alternatively, if speed is a concern, we can do the same using data.atble

library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE), 
           value ~ variable, drop = FALSE)
#              value Q9_A Q9_B Q9_C
# 1:            Impt    1    0    0
# 2:         Neutral    0    0    0
# 3: Not Impt at all    3    4    4
# 4:   Somewhat Impt    0    0    0
# 5:       Very Impt    6    6    6
like image 21
David Arenburg Avatar answered Oct 02 '22 01:10

David Arenburg