Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop unused factor levels in a subsetted data frame

I have a data frame containing a factor. When I create a subset of this dataframe using subset or another indexing function, a new data frame is created. However, the factor variable retains all of its original levels, even when/if they do not exist in the new dataframe.

This causes problems when doing faceted plotting or using functions that rely on factor levels.

What is the most succinct way to remove levels from a factor in the new dataframe?

Here's an example:

df <- data.frame(letters=letters[1:5],                     numbers=seq(1:5))  levels(df$letters) ## [1] "a" "b" "c" "d" "e"  subdf <- subset(df, numbers <= 3) ##   letters numbers ## 1       a       1 ## 2       b       2 ## 3       c       3      # all levels are still there! levels(subdf$letters) ## [1] "a" "b" "c" "d" "e" 
like image 364
medriscoll Avatar asked Jul 28 '09 18:07

medriscoll


People also ask

How do you remove a level from a factor?

Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)

How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

What is a factor in a data frame in R?

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values.


2 Answers

Since R version 2.12, there's a droplevels() function.

levels(droplevels(subdf$letters)) 
like image 78
Roman Luštrik Avatar answered Oct 11 '22 09:10

Roman Luštrik


All you should have to do is to apply factor() to your variable again after subsetting:

> subdf$letters [1] a b c Levels: a b c d e subdf$letters <- factor(subdf$letters) > subdf$letters [1] a b c Levels: a b c 

EDIT

From the factor page example:

factor(ff)      # drops the levels that do not occur 

For dropping levels from all factor columns in a dataframe, you can use:

subdf <- subset(df, numbers <= 3) subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x) 
like image 21
hatmatrix Avatar answered Oct 11 '22 08:10

hatmatrix