Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

drop = TRUE doesn't drop factor levels in data.frame while in vector it does

There is an interesting option drop = TRUE in data.frame filtering, see excerpt from help('[.data.frame'):

Usage

S3 method for class 'data.frame'

x[i, j, drop = ]

But when I try it on data.frame, it doesn't work!

> df = data.frame(a = c("europe", "asia", "oceania"), b = c(1, 2, 3))
>
> df[1:2,, drop = TRUE]$a
[1] europe asia  
Levels: asia europe oceania     <--- oceania shouldn't be here!!
>

I know there are other ways like

df2 <- droplevels(df[1:2,])

but the documentation promised much more elegant way to do this, so why it doesn't work? Is it a bug? Because I don't understand how this could be a feature...

EDIT: I was confused by drop = TRUE dropping factor levels for vectors, as you can see here. It is not very intuitive that [i, drop = TRUE] drops factor levels and [i, j, drop = TRUE] does not!!

like image 798
Tomas Avatar asked Jan 02 '13 14:01

Tomas


People also ask

How do you get rid of factor levels?

Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)

How do I drop a factor in R?

The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.

What does drop do in R?

drop() function deletes the dimensions of an array or matrix which have only one level. Let's create an array with 3 levels. The array has 3 dimensions with length 1, 2 and 4 respectively. The dimension with only one level can be dropped.


1 Answers

The documentation clearly states:

drop : logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.

This means that if drop = TRUE and the filtered data.frame results in a single column or row, the result is coerced to a vector/list instead of returning a single-column/single-row data.frame.

Therefore, this argument has no relation with levels dropping, and so the right way to eliminate exceeding levels is the one you mentioned (i.e. using droplevels function).

like image 70
digEmAll Avatar answered Nov 09 '22 22:11

digEmAll