I've loaded data from a CSV file into a data frame. Each column represents a survey question, and all of the answers are on a five-point Likert scale, with the labels: ("None", "Low", "Medium", "High", "Very High").
When I read in the data initially, R correctly interprets those values as factors but doesn't know what the ordering should be. I want to specify what the ordering is for the values so I can do some numerical calculations. I thought the following code would work:
X <- read.csv('..')
likerts <- data.frame(apply(X, 2, function(X){factor(X,
levels = c("None", "Low", "Medium", "High", "Very High"),
ordered = T)}))
What happens instead is that all of the level data gets converted into strings. How do I do this correctly?
When using data.frame
, R will convert again to a normal factor (or if stringsAsFactors = FALSE
to string). Use as.data.frame
instead. A trivial example with a toy data-frame:
X <- data.frame(
var1=rep(letters[1:5],3),
var2=rep(letters[1:5],each=3)
)
likerts <- as.data.frame(lapply(X, function(X){ordered(X,
levels = letters[5:1],labels=letters[5:1])}))
> str(likerts)
'data.frame': 15 obs. of 2 variables:
$ var1: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 4 3 2 1 5 4 3 2 1 ...
$ var2: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 5 5 4 4 4 3 3 3 2 ...
On a sidenote, ordered()
gives you an ordered factor, and lapply(X,...)
is more optimal than apply(X,2,...)
in case of dataframes.
And the obligatory plyr
solution (using Joris's example above):
> require(plyr)
> Y <- catcolwise( function(v) ordered(v, levels = letters[5:1]))(X)
> str(Y)
'data.frame': 15 obs. of 2 variables:
$ var1: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 4 3 2 1 5 4 3 2 1 ...
$ var2: Ord.factor w/ 5 levels "e"<"d"<"c"<"b"<..: 5 5 5 4 4 4 3 3 3 2 ...
Note that one good thing about catcolwise
is that it will only apply it to the columns of X that are factors, leaving the others alone. To explain what is going on: catcolwise
is a function that takes a function as an argument, and returns a function that operates "columnwise" on the factor-columns of the data-frame. So we can imagine the above line in two stages: fn <- catcolwise(...); Y <- fn(X)
. Note that there are also functions colwise
(operates on all columns) and numcolwise
(operate only on numerical columns).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With