Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one change the levels of a factor column in a data.table

Tags:

r

data.table

What is the correct way to change the levels of a factor column in a data.table (note: not data frame)

  library(data.table)   mydt <- data.table(id=1:6, value=as.factor(c("A", "A", "B", "B", "B", "C")), key="id")    mydt[, levels(value)]   [1] "A" "B" "C" 

I am looking for something like:

mydt[, levels(value) <- c("X", "Y", "Z")] 

But of course, the above line does not work.

    # Actual               # Expected result     > mydt                  > mydt        id value                id value     1:  1     A             1:  1     X     2:  2     A             2:  2     X     3:  3     B             3:  3     Y     4:  4     B             4:  4     Y     5:  5     B             5:  5     Y     6:  6     C             6:  6     Z 
like image 599
Ricardo Saporta Avatar asked Jan 31 '13 20:01

Ricardo Saporta


People also ask

How do you change the level of a factor?

One way to change the level order is to use factor() on the factor and specify the order directly. In this example, the function ordered() could be used instead of factor() .

How do you change the level of a factor in a Dataframe in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

What is a level of a factor?

The number of levels of a factor or independent variable is equal to the number of variations of that factor that were used in the experiment. If an experiment compared the drug dosages 50 mg, 100 mg, and 150 mg, then the factor "drug dosage" would have three levels: 50 mg, 100 mg, and 150 mg.

What is a factor level variable?

Factors are the variables that experimenters control during an experiment in order to determine their effect on the response variable. A factor can take on only a small number of values, which are known as factor levels.


2 Answers

You can still set them the traditional way:

levels(mydt$value) <- c(...) 

This should be plenty fast unless mydt is very large since that traditional syntax copies the entire object. You could also play the un-factoring and refactoring game... but no one likes that game anyway.

To change the levels by reference with no copy of mydt :

setattr(mydt$value,"levels",c(...)) 

but be sure to assign a valid levels vector (type character of sufficient length) otherwise you'll end up with an invalid factor (levels<- does some checking as well as copying).

like image 75
Justin Avatar answered Sep 19 '22 19:09

Justin


I would rather go the traditional way of re-assignment to the factors

> mydt$value # This we what we had originally [1] A A B B B C Levels: A B C > levels(mydt$value) # just checking the levels [1] "A" "B" "C" **# Meat of the re-assignment** > levels(mydt$value)[levels(mydt$value)=="A"] <- "X" > levels(mydt$value)[levels(mydt$value)=="B"] <- "Y" > levels(mydt$value)[levels(mydt$value)=="C"] <- "Z" > levels(mydt$value) [1] "X" "Y" "Z" > mydt # This is what we wanted    id value 1:  1     X 2:  2     X 3:  3     Y 4:  4     Y 5:  5     Y 6:  6     Z 

As you probably notices, the meat of the re-assignment is very intuitive, it checks for the exact level(use grepl in case there's a fuzzy math, regular expressions or likewise)

levels(mydt$value)[levels(mydt$value)=="A"] <- "X" This explicitly checks the value in the levels of the variable under consideration and then reassigns X (and so on) to it - The advantage- you explicitly KNOW what labeled what.

I find renaming levels as here levels(mydt$value) <- c("X","Y","Z") very non-intuitive, since it just assigns X to the 1st level it SEES in the data (so the order really matters)

PPS : In case of too many levels, use looping constructs.

like image 44
ekta Avatar answered Sep 23 '22 19:09

ekta