Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I take multiple vectors and recode their datatypes in R?

Tags:

r

statistics

I'm looking for an elegant way to change multiple vectors' datatypes in R.

I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1 = correct, 0 = incorrect), plus a column indicating which instructor (1, 2, or 3) taught their course.

As it stands, my data is sitting pretty in data.df, like this:

    str(data.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: int  1 1 1 1 1 1 0 0 0 1 ...
    $ ques02: int  0 0 1 1 1 1 1 1 1 1 ...
    $ ques03: int  0 0 1 1 0 0 1 1 0 1 ...
    $ ques04: int  1 0 1 1 1 1 1 1 1 1 ...
    $ ques05: int  0 0 0 0 1 0 0 0 0 0 ...
    $ ques06: int  1 0 1 1 0 1 1 1 1 1 ...
    $ ques07: int  0 0 1 1 0 1 1 0 0 1 ...
    $ ques08: int  0 0 1 1 1 0 1 1 0 1 ...
    $ inst  : num  1 1 1 1 1 1 1 1 1 1 ...

But those ques0x values aren't really integers. Rather, I think it's better to have R treat them as experimental factors. Same goes for the "inst" values.

I'd love to turn all those ints and nums into factors

Ideally, an elegant solution should produce a dataframe—I call it factorData.df—that looks like this:

    str(factorData.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
    $ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
    $ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
    $ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
    $ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
    $ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
    $ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
    $ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
    $ inst  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

I'm fairly certain that whatever solution you folks come up with, it ought to be easy to generalize to any n number of variables that'd need to get reclassified, and would work across most common conversions (int -> factor and num -> int, for example).

No matter what solution you folks generate, it's bound to be more elegant than mine

Because my current clunky code is just 9 separate factor() statements, one for each variable, like this

    factorData.df$ques01 

I'm brand-new to R, programming, and stackoverflow. Please be gentle, and thanks in advance for your help!

like image 329
briandk Avatar asked Sep 28 '09 20:09

briandk


1 Answers

This was also answered in R-Help.

I imagine that there's a better way to do it, but here are two options:

# use a sample data set
> str(cars)
'data.frame':   50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars 

You can use lapply:

> data.df <- data.frame(lapply(data.df, factor))

Or a for statement:

> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])

In either case, you end up with what you want:

> str(data.df)
'data.frame':   50 obs. of  2 variables:
 $ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
 $ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
like image 125
Shane Avatar answered Nov 15 '22 03:11

Shane