Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I drop unused levels from a data frame?

Tags:

r

levels

Given the following mock data:

set.seed(123)
x <- data.frame(let = sample(letters[1:5], 100, replace = T), 
                num = sample(1:10, 100, replace = T))
y <- subset(x, let != 'a')

Creating a table of y$let yields

a  b  c  d  e 
0 20 21 22 18

But I don't want a to show anymore. If I try to do this:

levels(y$let) <- factor(y$let)

I mess the frequencies, since now table(y$let) gives me

b  d  c  e 
0 20 21 40 

I'm aware I could do xtabs(~ y$let, drop.unused.levels = T) and work around the problem, but it doesn't reset the variable levels at its core (which is important to me, since this is an early change I'm making to the dataset which will carry on throughout the whole analysis). Moreover, xtabs is a different class from table, which will give me headaches later in the project.

The question is: how can I automatically change levels(y$let) so it doesn't show levels that were dropped when I created the subset? In this case, how can I make it show [1] "b" "c" "d" "e"?

like image 258
Waldir Leoncio Avatar asked Jun 20 '13 15:06

Waldir Leoncio


People also ask

How do you get rid of a factor level?

Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)

How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

How do I subset a Dataframe in R?

Subset a Data Frame with Base R Extract[] To specify a logical expression for the rows parameter, use the standard R operators. If subsetting is done by only rows or only columns, then leave the other value blank. For example, to subset the d data frame only by rows, the general form reduces to d[rows,] .


3 Answers

There's a recently added function in R for this:

y <- droplevels(y)
like image 112
Señor O Avatar answered Oct 20 '22 11:10

Señor O


Just do y$let <- factor(y$let). Running factor on an existing factor variable will reset the levels to only those that are present.

like image 34
Hong Ooi Avatar answered Oct 20 '22 10:10

Hong Ooi


Adding to Hong Ooi's answer, here is an example I found from R-Bloggers.

# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!

The solution is simple: run factor() again:
x <- factor(x)
levels(x)
like image 41
CRich Avatar answered Oct 20 '22 11:10

CRich