Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert certain columns of a data frame to become factors? [duplicate]

Possible Duplicate:
identifying or coding unique factors using R

I'm having some trouble with R.

I have a data set similar to the following, but much longer.

A B Pulse 1 2 23 2 2 24 2 2 12 2 3 25 1 1 65 1 3 45 

Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times.

As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors.

Help?

like image 766
math11 Avatar asked Nov 28 '12 20:11

math11


People also ask

How do you convert multiple columns to factors?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

How do you change all columns to factor in R?

To convert the data type of all columns from integer to factor, we can use lapply function with factor function.


1 Answers

Here's an example:

#Create a data frame > d<- data.frame(a=1:3, b=2:4) > d   a b 1 1 2 2 2 3 3 3 4  #currently, there are no levels in the `a` column, since it's numeric as you point out. > levels(d$a) NULL  #Convert that column to a factor > d$a <- factor(d$a) > d   a b 1 1 2 2 2 3 3 3 4  #Now it has levels. > levels(d$a) [1] "1" "2" "3" 

You can also handle this when reading in your data. See the colClasses and stringsAsFactors parameters in e.g. readCSV().

Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.

Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.

like image 172
Jeff Allen Avatar answered Sep 22 '22 06:09

Jeff Allen