Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`levels<-`( What sorcery is this?

Tags:

types

r

levels

The answers here are good, but they are missing an important point. Let me try and describe it.

R is a functional language and does not like to mutate its objects. But it does allow assignment statements, using replacement functions:

levels(x) <- y

is equivalent to

x <- `levels<-`(x, y)

The trick is, this rewriting is done by <-; it is not done by levels<-. levels<- is just a regular function that takes an input and gives an output; it does not mutate anything.

One consequence of that is that, according to the above rule, <- must be recursive:

levels(factor(x)) <- y

is

factor(x) <- `levels<-`(factor(x), y)

is

x <- `factor<-`(x, `levels<-`(factor(x), y))

It's kind of beautiful that this pure-functional transformation (up until the very end, where the assignment happens) is equivalent to what an assignment would be in an imperative language. If I remember correctly this construct in functional languages is called a lens.

But then, once you have defined replacement functions like levels<-, you get another, unexpected windfall: you don't just have the ability to make assignments, you have a handy function that takes in a factor, and gives out another factor with different levels. There's really nothing "assignment" about it!

So, the code you're describing is just making use of this other interpretation of levels<-. I admit that the name levels<- is a little confusing because it suggests an assignment, but this is not what is going on. The code is simply setting up a sort of pipeline:

  • Start with dat$product

  • Convert it to a factor

  • Change the levels

  • Store that in res

Personally, I think that line of code is beautiful ;)


No sorcery, that's just how (sub)assignment functions are defined. levels<- is a little different because it is a primitive to (sub)assign the attributes of a factor, not the elements themselves. There are plenty of examples of this type of function:

`<-`              # assignment
`[<-`             # sub-assignment
`[<-.data.frame`  # sub-assignment data.frame method
`dimnames<-`      # change dimname attribute
`attributes<-`    # change any attributes

Other binary operators can be called like that too:

`+`(1,2)  # 3
`-`(1,2)  # -1
`*`(1,2)  # 2
`/`(1,2)  # 0.5

Now that you know that, something like this should really blow your mind:

Data <- data.frame(x=1:10, y=10:1)
names(Data)[1] <- "HI"              # How does that work?!? Magic! ;-)

The reason for that "magic" is that the "assignment" form must have a real variable to work on. And the factor(dat$product) wasn't assigned to anything.

# This works since its done in several steps
x <- factor(dat$product)
levels(x) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
x

# This doesn't work although it's the "same" thing:
levels(factor(dat$product)) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
# Error: could not find function "factor<-"

# and this is the magic work-around that does work
`levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )

For user-code I do wonder why such language manipulations are used so? You ask what magic is this and others have pointed out that you are calling the replacement function that has the name levels<-. For most people this is magic and really the intended use is levels(foo) <- bar.

The use-case you show is different because product doesn't exist in the global environment so it only ever exists in the local environment of the call to levels<- thus the change you want to make does not persist - there was no reassignment of dat.

In these circumstances, within() is the ideal function to use. You would naturally wish to write

levels(product) <- bar

in R but of course product doesn't exist as an object. within() gets around this because it sets up the environment you wish to run your R code against and evaluates your expression within that environment. Assigning the return object from the call to within() thus succeeds in the properly modified data frame.

Here is an example (you don't need to create new datX - I just do that so the intermediary steps remain at the end)

## one or t'other
#dat2 <- transform(dat, product = factor(product))
dat2 <- within(dat, product <- factor(product))

## then
dat3 <- within(dat2, 
               levels(product) <- list(Tylenol=1:3, Advil=4:6, 
                                       Bayer=7:9, Generic=10:12))

Which gives:

> head(dat3)
  product
1 Generic
2 Generic
3   Bayer
4   Bayer
5   Advil
6 Tylenol
> str(dat3)
'data.frame':   20 obs. of  1 variable:
 $ product: Factor w/ 4 levels "Tylenol","Advil",..: 4 4 3 3 2 1 4 2 3 4 ...

I struggle to see how constructs like the one you show are useful in the majority of cases - if you want to change the data, change the data, don't create another copy and change that (which is all the levels<- call is doing after all).