Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter factor levels in R using dplyr

Tags:

r

dplyr

This is the glimpse() of my dataframe DF:

Observations: 221184
Variables:
$ Epsilon    (fctr) 96002.txt, 96002.txt, 96004.txt, 96004.txt, 96005.txt, 960...
$ Value   (int) 61914, 61887, 61680, 61649, 61776, 61800, 61753, 61725, 616...

I want to filter (remove) all the observations with the first two levels of Epsilon using dplyr.

I mean:

DF %>% filter(Epsilon != "96002.txt" & Epsilon != "96004.txt")

However, I don't want to use the string values (i.e., "96002.txt" and "96004.txt") but the level orders (i.e., 1 and 2), because it should be a general instruction independent of the level values.

like image 673
Medical physicist Avatar asked May 05 '15 11:05

Medical physicist


People also ask

How do you check factor levels in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

How do I filter two levels in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

How do you find the factor level of a column in R?

To extract the factor levels from factor column, we can simply use levels function. For example, if we have a data frame called df that contains a factor column defined with x then the levels of factor levels in x can be extracted by using the command levels(df$x).

What is filter Dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .


1 Answers

You can easily convert a factor into an integer and then use conditions on it. Just replace your filter statement with:

 filter(as.integer(Epsilon)>2)

More generally, if you have a vector of indices level you want to eliminate, you can try:

 #some random levels we don't want
 nonWantedLevels<-c(5,6,9,12,13)
 #just the filter part
 filter(!as.integer(Epsilon) %in% nonWantedLevels)
like image 132
nicola Avatar answered Sep 18 '22 22:09

nicola