Possible Duplicate:
identifying or coding unique factors using R
I'm having some trouble with R.
I have a data set similar to the following, but much longer.
A B Pulse 1 2 23 2 2 24 2 2 12 2 3 25 1 1 65 1 3 45
Basically, the first 2 columns are coded. A
has 1, 2 which represent 2 different weights. B
has 1, 2, 3 which represent 3 different times.
As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors.
Help?
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.
To convert the data type of all columns from integer to factor, we can use lapply function with factor function.
Here's an example:
#Create a data frame > d<- data.frame(a=1:3, b=2:4) > d a b 1 1 2 2 2 3 3 3 4 #currently, there are no levels in the `a` column, since it's numeric as you point out. > levels(d$a) NULL #Convert that column to a factor > d$a <- factor(d$a) > d a b 1 1 2 2 2 3 3 3 4 #Now it has levels. > levels(d$a) [1] "1" "2" "3"
You can also handle this when reading in your data. See the colClasses
and stringsAsFactors
parameters in e.g. readCSV()
.
Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.
Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With