Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for a dplyr function to apply a filter conditionally

Tags:

r

dplyr

tidyverse

I have a data frame of various hematology values and their collection times. Those values should only be collected at specific times, but occasionally an extra one is added. I want to remove any instances where a value was collected outside the scheduled time.

To illustrate the issue, here's some code to create a very simplified version of the data frame I'm working with (plus some example schedules):

example <- tibble("Parameter" = c(rep("hgb", 3), rep("bili", 3), rep("LDH", 3)), 
                  "Collection" = c(1, 3, 4, 1, 5, 6, 0, 4, 8))

hgb_sampling <- c(1, 4)
bili_sampling <- c(1, 5)
ldh_sampling <- c(0, 4)

So, I need an way to conditionally apply a filter based on the value in the Parameter column. The solution needs to fit into a dyplr pipeline and yield something like this:

filtered <- tibble("Parameter" = c(rep("hemoglobin", 2), rep("bilirubin", 2), rep("LDH", 2)), 
                  "Collection" = c(1, 4, 1, 5, 0, 4))

I've tried a couple things (they all amount to something like the below) but the use of "Parameter" trips things up:

df <- example %>%
  {if (Parameter == "hgb") filter(., Collection %in% hgb_sampling)} 

Any suggestions?

like image 698
jsgraydon Avatar asked Mar 26 '20 06:03

jsgraydon


People also ask

Is filter a dplyr function?

Of course, dplyr has 'filter()' function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.

What is the function of code filter () in R?

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.

What is filter dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .


1 Answers

You could create a reference tibble, join it with example and keep only selected rows.

library(dplyr)

ref_df <- tibble::tibble(Parameter = c("hgb","bili", "LDH"), 
                         value  = list(c(1, 4), c(1, 5), c(0, 4)))

example %>%
  inner_join(ref_df, by = 'Parameter') %>%
  group_by(Parameter) %>%
  filter(Collection %in% unique(unlist(value))) %>%
  select(Parameter, Collection)

#  Parameter Collection
#  <chr>          <dbl>
#1 hgb                1
#2 hgb                4
#3 bili               1
#4 bili               5
#5 LDH                0
#6 LDH                4
like image 185
Ronak Shah Avatar answered Oct 31 '22 08:10

Ronak Shah