An example of a non-settable function would be <code>labels</code>. You can only set factor labels when they are created with the factor function. There is no <code>labels<-</code> function. Not that 'labels' and 'levels' in factors make any sense.... <pre class="prettyprint"><code>> fac <- factor(1:3, labels=c("one", "two", "three")) > fac [1] one two three Levels: one two three > labels(fac) [1] "1" "2" "3" </code></pre> OK, I asked for labels, which one might assume were as set by the factor call, but I get something quite ... what's the word, unintuitive? <pre class="prettyprint"><code>> levels(fac) [1] "one" "two" "three" </code></pre> So it appears that setting labels is really setting levels. <pre class="prettyprint"><code>> fac <- factor(1:3, levels=c("one", "two", "three")) > levels(fac) [1] "one" "two" "three" </code></pre> OK that is as expected. So what are labels when one sets levels? <pre class="prettyprint"><code>> fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") ) > labels(fac) [1] "1" "2" "3" > levels(fac) [1] "x" "y" "z" </code></pre> Effing weird, if you ask me. It would seem that 'labels' arguments for factor trump any 'levels' arguments for the specification of levels. Why should this be? Seems like a confused terminology. And why does <code>labels()</code> return what I would have imagined to be retrieved with as.character(as.numeric(fac))? (This was a tangential comment [labelled as such] in an earlier answer about assignment functions to which I was asked to move to a question. So here's your opportunity to enlighten me.)

I think the way to think about the difference between <code>labels</code> and <code>levels</code> (ignoring the <code>labels()</code> function that Tommy describes in his answer) is that <code>levels</code> is intended to tell R which values to look for in the input (<code>x</code>) and what order to use in the levels of the resulting <code>factor</code> object, and <code>labels</code> is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the <code>factor</code> object returned by <code>factor()</code> that is called <code>labels</code> ... just the levels, which have been adjusted by the <code>labels</code> argument ... (clear as mud). For example: <pre class="prettyprint"><code>> f <- factor(x=c("a","b","c"),levels=c("c","d","e")) > f [1] <NA> <NA> c Levels: c d e > str(f) Factor w/ 3 levels "c","d","e": NA NA 1 </code></pre> Because the first two elements of <code>x</code> were not found in <code>levels</code>, the first two elements of <code>f</code> are <code>NA</code>. Because <code>"d"</code> and <code>"e"</code> were included in <code>levels</code>, they show up in the levels of <code>f</code> even though they did not occur in <code>x</code>. Now with <code>labels</code>: <pre class="prettyprint"><code>> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E")) > f [1] <NA> <NA> C Levels: C D E </code></pre> After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as: <pre class="prettyprint"><code>> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c")) > f [1] <NA> <NA> a Levels: a b c </code></pre> Another way to think about <code>levels</code> is that <code>factor(x,levels=L1,labels=L2)</code> is equivalent to <pre class="prettyprint"><code>f <- factor(x,levels=L1) levels(f) <- L2 </code></pre> I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...

Why is the terminology of labels and levels in factors so weird?

Q: What are the factor labels?

The factor-label method is a technique for converting units of measurement into other units of measurement. The technique uses conversion factors that are made from equalities between units. The conversion units are arranged in fraction form in such a way as to cancel all other units except the desired unit.

Q: Why do we need factors in R?

In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order. Historically, factors were much easier to work with than characters.

Q: What is a factor in data?

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for statistical modeling.

Q: What is the difference between integer and factor in R?

Factors are stored as integers, and have labels associated with these unique integers. While factors look (and often behave) like character vectors, they are actually integers under the hood, and you need to be careful when treating them like strings.

Tags:

r

levels

factors

An example of a non-settable function would be labels. You can only set factor labels when they are created with the factor function. There is no labels<- function. Not that 'labels' and 'levels' in factors make any sense....

>  fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one   two   three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"

OK, I asked for labels, which one might assume were as set by the factor call, but I get something quite ... what's the word, unintuitive?

> levels(fac)
[1] "one"   "two"   "three"

So it appears that setting labels is really setting levels.

>  fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one"   "two"   "three"

OK that is as expected. So what are labels when one sets levels?

>  fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"

Effing weird, if you ask me. It would seem that 'labels' arguments for factor trump any 'levels' arguments for the specification of levels. Why should this be? Seems like a confused terminology. And why does labels() return what I would have imagined to be retrieved with as.character(as.numeric(fac))?

(This was a tangential comment [labelled as such] in an earlier answer about assignment functions to which I was asked to move to a question. So here's your opportunity to enlighten me.)

264

asked Aug 19 '11 23:08

IRTFM

2 Answers

I think the way to think about the difference between labels and levels (ignoring the labels() function that Tommy describes in his answer) is that levels is intended to tell R which values to look for in the input (x) and what order to use in the levels of the resulting factor object, and labels is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the factor object returned by factor() that is called labels ... just the levels, which have been adjusted by the labels argument ... (clear as mud).

For example:

> f <- factor(x=c("a","b","c"),levels=c("c","d","e")) > f [1] <NA> <NA> c   Levels: c d e > str(f) Factor w/ 3 levels "c","d","e": NA NA 1

Because the first two elements of x were not found in levels, the first two elements of f are NA. Because "d" and "e" were included in levels, they show up in the levels of f even though they did not occur in x.

Now with labels:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E")) > f [1] <NA> <NA> C    Levels: C D E

After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c")) > f [1] <NA> <NA> a    Levels: a b c

Another way to think about levels is that factor(x,levels=L1,labels=L2) is equivalent to

f <- factor(x,levels=L1) levels(f) <- L2

I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...

190

answered Sep 19 '22 22:09

Ben Bolker

The labels function sounds like the perfect fit for getting the labels of a factor.

...but the labels function has nothing to do with factors! It is used as a generic way of getting something to "label" an object. For atomic vectors, this would be the names. But if there are no names, the labels function returns the element indices coerced to strings - something like as.character(seq_along(x)).

...So that's what your seeing when you try labels on a factor. The factor is an integer vector without any names, but with a levels attribute.

A factor has no labels. It only has levels. The labels argument to factor is just a way to be able to give a set of strings but produce another set of strings as the levels... But to confuse things further, the dput function prints the levels attributes as .Label! I think that is a legacy thing...

# Translate lower case letters to upper case.
f <- factor(letters[2:4], letters[1:3], LETTERS[1:3])
dput(f)
#structure(c(2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor")
attributes(f)
#$levels
#[1] "A" "B" "C"
#
#$class
#[1] "factor"

However, since labels is a generic function, it would probably be a good idea to define labels.factor as follows (currently there is none). Perhaps something for R core to consider?

labels.factor <- function(x, ...) as.character(x)

answered Sep 16 '22 22:09

Tommy

Related questions
                            
                                R and SPSS difference
                            
                                Is there a weighted.median() function?
                            
                                Function to calculate R2 (R-squared) in R
                            
                                R tm package invalid input in 'utf8towcs'
                            
                                Fuzzy search box widget with `Shiny` in R?
                            
                                R: legend with points and lines being different colors (for the same legend item)
                            
                                NOTE in R CRAN Check: No repository set, so cyclic dependency check skipped
                            
                                There is pmin and pmax each taking na.rm, why no psum?
                            
                                Check if character string is a valid color representation
                            
                                What are the differences between concatenating strings with cat() and paste()?
                            
                                Implementation of standard recycling rules
                            
                                What constitutes a good package name according to CRAN? [closed]
                            
                                Why does the number 1e9999... (31 9s) cause problems in R?
                            
                                Is there a more efficient way to replace NULL with NA in a list?
                            
                                Apply function on a subset of columns (.SDcols) whilst applying a different function on another column (within groups)
                            
                                How (and why) do you use contrasts?
                            
                                Dynamic height and width for knitr plots
                            
                                Dealing with Messy Dates
                            
                                How can I sort a data.frame with only one column, without losing rownames?
                            
                                Solving simultaneous equations with R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With