Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group factor levels?

Tags:

r

I have a factor column with football position abbreviations, around 17 unique values with 220 observations. I want to have only three factor levels which encompass the 17 unique values.

levels(nfldraft$Pos) <- list(Linemen = c("C","OG","OT","TE","DT","DE"),
                             Small_Backs =  c("CB","WR","FS"), 
                             Big_Backs = c("FB","ILB","OLB","P","QB",
                                           "RB","SS","WR"))

is what I tried, printing nfldraft$Pos to the console shows 3 factor levels but all the values are either "Linemen" or "Small_Backs" and all the other ones are NA. Where am I going wrong?

like image 500
Amin Sammara Avatar asked Aug 13 '16 16:08

Amin Sammara


People also ask

How do you set factor levels?

One way to change the level order is to use factor() on the factor and specify the order directly. In this example, the function ordered() could be used instead of factor() . Another way to change the order is to use relevel() to make a particular level first in the list.

What are factor levels?

Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels. If there is only one factor with k levels, then there would be k treatments.

How do I combine two factors in R?

To combine two factor vectors, we can extract the unique levels of both the vectors then combine those levels. This can be done by using unique function. Also, we can set the levels of the original vectors to the combination of the levels, in this way, we can complete both the vectors with missing levels.

What does levels () do in R?

Levels() function provides access to the levels attribute of a variable. The first form returns the value of the levels of its argument and the second sets the attribute.

What is the difference between factor and level?

Factor: a categorical explanatory variable. Levels: values of a factor. Treatment: a particular combination of values for the factors. Experimental units: smallest unit to which a treatment is applied.

How do you find the level of a factor in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.


2 Answers

I made up an example character vector with all of the abbreviations:

my_example <- c("C","OG","OT","TE","DT","DE","CB","WR","FS", 
                "FB","ILB","OLB","P","QB","RB","SS","WR")
class(my_example)

[1] "character"

Then I substituted the desired levels for their abbreviations (you could also use gsub here or any of many, many different approaches):

my_example[my_example %in% c("C","OG","OT","TE","DT","DE")] <- "Linemen"
my_example[my_example %in% c("CB","WR","FS")]               <- "Small Backs"
my_example[my_example %in% c("FB","ILB","OLB","P",
                             "QB","RB","SS","WR")]          <- "Big Backs"

Then I made it into a factor:

my_example <- as.factor(my_example)
head(my_example)
[1] Linemen Linemen Linemen Linemen Linemen Linemen
Levels: Big Backs Linemen Small Backs
tail(my_example)
[1] Big Backs   Big Backs   Big Backs   Big Backs   Big Backs   Small Backs
Levels: Big Backs Linemen Small Backs
class(my_example)

[1] "factor"

like image 91
Hack-R Avatar answered Oct 18 '22 04:10

Hack-R


This is a good example of needing a fully reproducible example. Actually OP's code looks like it should work. Taking from @Hack-R's sample input:

my_example <- c("C","OG","OT","TE","DT","DE","CB","WR","FS", 
                "FB","ILB","OLB","P","QB","RB","SS","WR")

OP's original code works as-is:

nfldraft = list(Pos = factor(my_example))
levels(nfldraft$Pos) <- list(
  Linemen = c("C","OG","OT","TE","DT","DE"), 
  Small_Backs =  c("CB","WR","FS"), 
  Big_Backs = c("FB","ILB","OLB","P","QB","RB","SS","WR")
)
table(nfldraft$Pos)
#     Linemen Small_Backs   Big_Backs 
#           6           2           9 

This is exactly in line with the documentation for how to use levels<-:

levels(x) <- value

value A valid value for levels(x)... For the factor method, a vector of character strings with length at least the number of levels of x, or a named list specifying how to rename the levels.

So it seems there's something else wrong with OP's input

like image 37
MichaelChirico Avatar answered Oct 18 '22 04:10

MichaelChirico