I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1
= correct, 0
= incorrect), plus a column indicating which instructor (1, 2, or 3
) taught their course.
As it stands, my data is sitting pretty in data.df
, like this:
str(data.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: int 1 1 1 1 1 1 0 0 0 1 ...
$ ques02: int 0 0 1 1 1 1 1 1 1 1 ...
$ ques03: int 0 0 1 1 0 0 1 1 0 1 ...
$ ques04: int 1 0 1 1 1 1 1 1 1 1 ...
$ ques05: int 0 0 0 0 1 0 0 0 0 0 ...
$ ques06: int 1 0 1 1 0 1 1 1 1 1 ...
$ ques07: int 0 0 1 1 0 1 1 0 0 1 ...
$ ques08: int 0 0 1 1 1 0 1 1 0 1 ...
$ inst : num 1 1 1 1 1 1 1 1 1 1 ...
But those ques0x
values aren't really integers. Rather, I think it's better to have R treat them as experimental factors. Same goes for the "inst" values.
int
s and num
s into factors
Ideally, an elegant solution should produce a dataframe—I call it factorData.df
—that looks like this:
str(factorData.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
$ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
$ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
$ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
$ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
$ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
$ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
$ inst : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
I'm fairly certain that whatever solution you folks come up with, it ought to be easy to generalize to any n number of variables that'd need to get reclassified, and would work across most common conversions (int -> factor
and num -> int
, for example).
Because my current clunky code is just 9 separate factor()
statements, one for each variable, like this
factorData.df$ques01
I'm brand-new to R, programming, and stackoverflow. Please be gentle, and thanks in advance for your help!
This was also answered in R-Help.
I imagine that there's a better way to do it, but here are two options:
# use a sample data set
> str(cars)
'data.frame': 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
$ dist : num 2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars
You can use lapply
:
> data.df <- data.frame(lapply(data.df, factor))
Or a for
statement:
> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])
In either case, you end up with what you want:
> str(data.df)
'data.frame': 50 obs. of 2 variables:
$ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
$ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With