I have a data frame of various hematology values and their collection times. Those values should only be collected at specific times, but occasionally an extra one is added. I want to remove any instances where a value was collected outside the scheduled time.
To illustrate the issue, here's some code to create a very simplified version of the data frame I'm working with (plus some example schedules):
example <- tibble("Parameter" = c(rep("hgb", 3), rep("bili", 3), rep("LDH", 3)),
"Collection" = c(1, 3, 4, 1, 5, 6, 0, 4, 8))
hgb_sampling <- c(1, 4)
bili_sampling <- c(1, 5)
ldh_sampling <- c(0, 4)
So, I need an way to conditionally apply a filter based on the value in the Parameter column. The solution needs to fit into a dyplr pipeline and yield something like this:
filtered <- tibble("Parameter" = c(rep("hemoglobin", 2), rep("bilirubin", 2), rep("LDH", 2)),
"Collection" = c(1, 4, 1, 5, 0, 4))
I've tried a couple things (they all amount to something like the below) but the use of "Parameter" trips things up:
df <- example %>%
{if (Parameter == "hgb") filter(., Collection %in% hgb_sampling)}
Any suggestions?
Of course, dplyr has 'filter()' function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.
The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .
You could create a reference tibble, join it with example
and keep only selected rows.
library(dplyr)
ref_df <- tibble::tibble(Parameter = c("hgb","bili", "LDH"),
value = list(c(1, 4), c(1, 5), c(0, 4)))
example %>%
inner_join(ref_df, by = 'Parameter') %>%
group_by(Parameter) %>%
filter(Collection %in% unique(unlist(value))) %>%
select(Parameter, Collection)
# Parameter Collection
# <chr> <dbl>
#1 hgb 1
#2 hgb 4
#3 bili 1
#4 bili 5
#5 LDH 0
#6 LDH 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With