What is the correct way to change the levels of a <code>factor</code> column in a <code>data.table</code> (note: not data frame) <pre class="prettyprint"><code> library(data.table) mydt <- data.table(id=1:6, value=as.factor(c("A", "A", "B", "B", "B", "C")), key="id") mydt[, levels(value)] [1] "A" "B" "C" </code></pre> I am looking for something like: <pre class="prettyprint"><code>mydt[, levels(value) <- c("X", "Y", "Z")] </code></pre> But of course, the above line does not work. <pre class="prettyprint"><code> # Actual # Expected result > mydt > mydt id value id value 1: 1 A 1: 1 X 2: 2 A 2: 2 X 3: 3 B 3: 3 Y 4: 4 B 4: 4 Y 5: 5 B 5: 5 Y 6: 6 C 6: 6 Z </code></pre>

You can still set them the traditional way: <pre class="prettyprint"><code>levels(mydt$value) <- c(...) </code></pre> This should be plenty fast unless <code>mydt</code> is very large since that traditional syntax copies the entire object. You could also play the un-factoring and refactoring game... but no one likes that game anyway. To change the levels by reference with no copy of <code>mydt</code> : <pre class="prettyprint"><code>setattr(mydt$value,"levels",c(...)) </code></pre> but be sure to assign a valid levels vector (type <code>character</code> of sufficient length) otherwise you'll end up with an invalid factor (<code>levels<-</code> does some checking as well as copying).

I would rather go the traditional way of re-assignment to the factors <pre class="prettyprint"><code>> mydt$value # This we what we had originally [1] A A B B B C Levels: A B C > levels(mydt$value) # just checking the levels [1] "A" "B" "C" **# Meat of the re-assignment** > levels(mydt$value)[levels(mydt$value)=="A"] <- "X" > levels(mydt$value)[levels(mydt$value)=="B"] <- "Y" > levels(mydt$value)[levels(mydt$value)=="C"] <- "Z" > levels(mydt$value) [1] "X" "Y" "Z" > mydt # This is what we wanted id value 1: 1 X 2: 2 X 3: 3 Y 4: 4 Y 5: 5 Y 6: 6 Z </code></pre> As you probably notices, the meat of the re-assignment is very intuitive, it checks for the exact <code>level</code>(use <code>grepl</code> in case there's a fuzzy math, regular expressions or likewise) <code>levels(mydt$value)[levels(mydt$value)=="A"] <- "X"</code> This explicitly checks the value in the <code>levels</code> of the variable under consideration and then reassigns <code>X</code> (and so on) to it - The advantage- you explicitly KNOW what labeled what. I find renaming levels as here <code>levels(mydt$value) <- c("X","Y","Z")</code> very non-intuitive, since it just assigns <code>X</code> to the 1st level it SEES in the data (so the order really matters) PPS : In case of too many levels, use looping constructs.

How does one change the levels of a factor column in a data.table

Q: What is a level of a factor?

The number of levels of a factor or independent variable is equal to the number of variations of that factor that were used in the experiment. If an experiment compared the drug dosages 50 mg, 100 mg, and 150 mg, then the factor "drug dosage" would have three levels: 50 mg, 100 mg, and 150 mg.

Q: What is a factor level variable?

Factors are the variables that experimenters control during an experiment in order to determine their effect on the response variable. A factor can take on only a small number of values, which are known as factor levels.

Tags:

r

data.table

What is the correct way to change the levels of a factor column in a data.table (note: not data frame)

  library(data.table)   mydt <- data.table(id=1:6, value=as.factor(c("A", "A", "B", "B", "B", "C")), key="id")    mydt[, levels(value)]   [1] "A" "B" "C"

I am looking for something like:

mydt[, levels(value) <- c("X", "Y", "Z")]

But of course, the above line does not work.

    # Actual               # Expected result     > mydt                  > mydt        id value                id value     1:  1     A             1:  1     X     2:  2     A             2:  2     X     3:  3     B             3:  3     Y     4:  4     B             4:  4     Y     5:  5     B             5:  5     Y     6:  6     C             6:  6     Z

599

asked Jan 31 '13 20:01

Ricardo Saporta

2 Answers

You can still set them the traditional way:

levels(mydt$value) <- c(...)

This should be plenty fast unless mydt is very large since that traditional syntax copies the entire object. You could also play the un-factoring and refactoring game... but no one likes that game anyway.

To change the levels by reference with no copy of mydt :

setattr(mydt$value,"levels",c(...))

but be sure to assign a valid levels vector (type character of sufficient length) otherwise you'll end up with an invalid factor (levels<- does some checking as well as copying).

answered Sep 19 '22 19:09

Justin

I would rather go the traditional way of re-assignment to the factors

> mydt$value # This we what we had originally [1] A A B B B C Levels: A B C > levels(mydt$value) # just checking the levels [1] "A" "B" "C" **# Meat of the re-assignment** > levels(mydt$value)[levels(mydt$value)=="A"] <- "X" > levels(mydt$value)[levels(mydt$value)=="B"] <- "Y" > levels(mydt$value)[levels(mydt$value)=="C"] <- "Z" > levels(mydt$value) [1] "X" "Y" "Z" > mydt # This is what we wanted    id value 1:  1     X 2:  2     X 3:  3     Y 4:  4     Y 5:  5     Y 6:  6     Z

As you probably notices, the meat of the re-assignment is very intuitive, it checks for the exact level(use grepl in case there's a fuzzy math, regular expressions or likewise)

levels(mydt$value)[levels(mydt$value)=="A"] <- "X" This explicitly checks the value in the levels of the variable under consideration and then reassigns X (and so on) to it - The advantage- you explicitly KNOW what labeled what.

I find renaming levels as here levels(mydt$value) <- c("X","Y","Z") very non-intuitive, since it just assigns X to the 1st level it SEES in the data (so the order really matters)

PPS : In case of too many levels, use looping constructs.

answered Sep 23 '22 19:09

ekta

Related questions
                            
                                ggplot2 pdf import in Adobe Illustrator missing font AdobePiStd
                            
                                Normalizing y-axis in histograms in R ggplot to proportion
                            
                                Concatenate strings by group with dplyr [duplicate]
                            
                                count number of rows in a data frame in R based on group [duplicate]
                            
                                Importing data into R from google spreadsheet
                            
                                Select the first and last row by group in a data frame
                            
                                Using dplyr window functions to calculate percentiles
                            
                                Select every other element from a vector
                            
                                How to replace NaN value with zero in a huge data frame?
                            
                                Implementing standard software design patterns (focus on MVC) in R
                            
                                How to send R markdown report in body of email?
                            
                                Simple manual RMarkdown tables that look good in HTML, PDF and DOCX
                            
                                Is there a way to output text to the R console in color
                            
                                R not finding package even after package installation
                            
                                How to control number of decimal digits in write.table() output?
                            
                                Creating Classes in R: S3, S4, R5 (RC), or R6? [closed]
                            
                                Create dataframe from a matrix
                            
                                Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code
                            
                                How to stop emacs from replacing underbar with <- in ess-mode
                            
                                Get Object methods R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With