Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop unused levels in table with data.table?

Tags:

r

data.table

Consider the following data.table:

x <- data.table(
          x=sample(letters[1:5],10,rep=T), 
          y=factor(sample(letters[1:5],10,rep=T), levels=letters))

This situation arises several times while working with data.tables where some of the factor fields have unused variables.

Now, if we use the following table:

table(x)

A giant table with all unused levels shows up. Is there a way in table methods or data.table to do this?

I know that following is possible:

x$y <- factor(x$y)

But this is not useful because I don't want to save each of the sub-tables to a different variable.

like image 681
Shambho Avatar asked Mar 05 '15 15:03

Shambho


1 Answers

You can use droplevel as follows

x[,y:=droplevels(y)]

this overwrites y by reference with droplevels(y)

Results in

> table(x)
   y
x   b c d e
  a 1 1 1 2
  b 0 1 0 0
  c 1 0 0 0
  d 1 0 0 0
  e 0 0 2 0
like image 149
Rentrop Avatar answered Sep 29 '22 14:09

Rentrop