I have a data frame of various hematology values and their collection times. Those values should only be collected at specific times, but occasionally an extra one is added. I want to remove any instances where a value was collected outside the scheduled time. To illustrate the issue, here's some code to create a very simplified version of the data frame I'm working with (plus some example schedules): <pre class="prettyprint"><code>example <- tibble("Parameter" = c(rep("hgb", 3), rep("bili", 3), rep("LDH", 3)), "Collection" = c(1, 3, 4, 1, 5, 6, 0, 4, 8)) hgb_sampling <- c(1, 4) bili_sampling <- c(1, 5) ldh_sampling <- c(0, 4) </code></pre> So, I need an way to conditionally apply a filter based on the value in the Parameter column. The solution needs to fit into a dyplr pipeline and yield something like this: <pre class="prettyprint"><code>filtered <- tibble("Parameter" = c(rep("hemoglobin", 2), rep("bilirubin", 2), rep("LDH", 2)), "Collection" = c(1, 4, 1, 5, 0, 4)) </code></pre> I've tried a couple things (they all amount to something like the below) but the use of "Parameter" trips things up: <pre class="prettyprint"><code>df <- example %>% {if (Parameter == "hgb") filter(., Collection %in% hgb_sampling)} </code></pre> Any suggestions?

You could create a reference tibble, join it with <code>example</code> and keep only selected rows. <pre class="prettyprint"><code>library(dplyr) ref_df <- tibble::tibble(Parameter = c("hgb","bili", "LDH"), value = list(c(1, 4), c(1, 5), c(0, 4))) example %>% inner_join(ref_df, by = 'Parameter') %>% group_by(Parameter) %>% filter(Collection %in% unique(unlist(value))) %>% select(Parameter, Collection) # Parameter Collection # <chr> <dbl> #1 hgb 1 #2 hgb 4 #3 bili 1 #4 bili 5 #5 LDH 0 #6 LDH 4 </code></pre>

Looking for a dplyr function to apply a filter conditionally

Tags:

r

dplyr

tidyverse

I have a data frame of various hematology values and their collection times. Those values should only be collected at specific times, but occasionally an extra one is added. I want to remove any instances where a value was collected outside the scheduled time.

To illustrate the issue, here's some code to create a very simplified version of the data frame I'm working with (plus some example schedules):

example <- tibble("Parameter" = c(rep("hgb", 3), rep("bili", 3), rep("LDH", 3)), 
                  "Collection" = c(1, 3, 4, 1, 5, 6, 0, 4, 8))

hgb_sampling <- c(1, 4)
bili_sampling <- c(1, 5)
ldh_sampling <- c(0, 4)

So, I need an way to conditionally apply a filter based on the value in the Parameter column. The solution needs to fit into a dyplr pipeline and yield something like this:

filtered <- tibble("Parameter" = c(rep("hemoglobin", 2), rep("bilirubin", 2), rep("LDH", 2)), 
                  "Collection" = c(1, 4, 1, 5, 0, 4))

I've tried a couple things (they all amount to something like the below) but the use of "Parameter" trips things up:

df <- example %>%
  {if (Parameter == "hgb") filter(., Collection %in% hgb_sampling)}

Any suggestions?

698

asked Mar 26 '20 06:03

jsgraydon

1 Answers

You could create a reference tibble, join it with example and keep only selected rows.

library(dplyr)

ref_df <- tibble::tibble(Parameter = c("hgb","bili", "LDH"), 
                         value  = list(c(1, 4), c(1, 5), c(0, 4)))

example %>%
  inner_join(ref_df, by = 'Parameter') %>%
  group_by(Parameter) %>%
  filter(Collection %in% unique(unlist(value))) %>%
  select(Parameter, Collection)

#  Parameter Collection
#  <chr>          <dbl>
#1 hgb                1
#2 hgb                4
#3 bili               1
#4 bili               5
#5 LDH                0
#6 LDH                4

185

answered Oct 31 '22 08:10

Ronak Shah

Related questions
                            
                                Image processing: Average grayscale images
                            
                                Unable to pass user inputs into R shiny modules
                            
                                R's equivalent of string.replace() in python
                            
                                Shiny widgets in DT Table
                            
                                R Mutate multiple columns with ifelse()-condition
                            
                                Reading numpy ndarrays into R?
                            
                                How to format the input of Shiny updated numericInput but not change the actual value?
                            
                                Extract p-value from checkresiduals function
                            
                                Converting unit abbreviations to numbers
                            
                                Change filename when downloading data from datatable R
                            
                                Using the R cut function - how do the breaks and labels options work
                            
                                Recommended way to subset two vectors with the same index vector
                            
                                Reconvert numeric date to POSIXct R
                            
                                How to get quantiles to work with summarise_at and group_by (dplyr)
                            
                                R: Force regression coefficients to add up to 1
                            
                                translate this loop into purr?
                            
                                Rails 6.0 action text couldn't find file 'trix/dist/trix' with type 'text/css'
                            
                                How to convert scientific notation to decimal in tibbles?
                            
                                Emulating reshape2::melt with pivot_longer for matrixes
                            
                                How to dodge overlapping segments to keep them parallel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With