Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion between factor levels and factor labels

Tags:

r

r-faq

r-factor

People also ask

What is a factor level variable?

Factors are the variables that experimenters control during an experiment in order to determine their effect on the response variable. A factor can take on only a small number of values, which are known as factor levels.

How do you label a factor in R?

To understand value labels in R, you need to understand the data structure factor. You can use the factor function to create your own value labels. Use the factor() function for nominal data and the ordered() function for ordinal data. R statistical and graphic functions will then treat the data appriopriately.

What is the difference between factors and levels?

Answer. Factor is another way of referring to a categorical variable. Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels.

What are the factor labels?

The factor-label method is a technique for converting units of measurement into other units of measurement. The technique uses conversion factors that are made from equalities between units. The conversion units are arranged in fraction form in such a way as to cancel all other units except the desired unit.


Very short : levels are the input, labels are the output in the factor() function. A factor has only a level attribute, which is set by the labels argument in the factor() function. This is different from the concept of labels in statistical packages like SPSS, and can be confusing in the beginning.

What you do in this line of code

df$f <- factor(df$f, levels=c('a','b','c'),
  labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

is telling to R that there is a vector df$f

  • which you want to transform into a factor,
  • in which the different levels are coded as a, b, and c
  • and for which you want the levels to be labeled as Treatment A etc.

The factor function will look for the values a, b and c, convert them to numerical factor classes, and add the label values to the level attribute of the factor. This attribute is used to convert the internal numerical values to the correct labels. But as you see, there is no label attribute.

> df <- data.frame(v=c(1,2,3),f=c('a','b','c'))    
> attributes(df$f)
$levels
[1] "a" "b" "c"

$class
[1] "factor"

> df$f <- factor(df$f, levels=c('a','b','c'),
+   labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))    
> attributes(df$f)
$levels
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

$class
[1] "factor"

I wrote a package "lfactors" that allows you to refer to either levels or labels.

# packages
install.packages("lfactors")
require(lfactors)

flips <- lfactor(c(0,1,1,0,0,1), levels=0:1, labels=c("Tails", "Heads"))
# Tails can now be referred to as, "Tails" or 0
# These two lines return the same result
flips == "Tails"
#[1]  TRUE FALSE FALSE  TRUE  TRUE FALSE
flips == 0 
#[1]  TRUE FALSE FALSE  TRUE  TRUE FALSE

Note that an lfactor requires that the levels be numeric so that they cannot be confused with the labels.