I am a novice in R. I am compiling a separate manual on the syntax for the common functions/features for my work. My sample dataframe as follows:
x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L,
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all",
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L,
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral",
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"),
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt",
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B",
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")
> x.sample
Q9_A Q9_B Q9_C
1 Very Impt Very Impt Not Impt at all
2 Not Impt at all Very Impt Very Impt
3 Very Impt Very Impt Very Impt
4 Not Impt at all Not Impt at all Not Impt at all
5 Very Impt Very Impt Very Impt
6 Not Impt at all Very Impt Very Impt
7 Impt Not Impt at all Not Impt at all
8 Very Impt Very Impt Very Impt
9 Very Impt Not Impt at all Very Impt
10 Very Impt Not Impt at all Not Impt at all
My original dataframe has 21 columns.
If I want to find the mean (treating this as an ordinal variable):
> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C
4.0 4.2 4.2
I would like to tabulate a frequency table for ALL the variables in my dataframe. I searched the internet and many forums and saw that the nearest command to do this is using sapply. But when I did it, it gave all 0s.
> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
Q9_A Q9_B Q9_C
Not Impt at all 0 0 0
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 0 0 0
Very Impt 0 0 0
QUESTION How can I make use of sapply to tabulate a frequency chart as per the above table for all the columns (that are factors) in a dataframe?
PS So sorry if this seems trivia but I have searched for 2 days without an answer and trying all possible combinations. Maybe I didn't search hard enough =(
Thanks very much.
To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.
You were nearly there. Just one small change in your function would have got you there. The x
in function(x) ...
needs to be passed through to the table()
call:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
A little re-jig of the code might make it a bit easier to read too:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
Coming a bit late, but here's a reshape2
possible solution. It could have been very straightforward with recast
but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE
within melt
and drop = FALSE
within dcast
, while recast
can't pass arguments to melt
(only to dcast
), so here goes
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
If we wouldn't care about empty levels a quick solution would be just
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
Alternatively, if speed is a concern, we can do the same using data.atble
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With