I am working with medical claims data to create inpatient episodes. There isn't an 'episode identifier' column within the dataset. My intention is to create a unique identifier for each episode to tie the claims to, which I can handle after I can properly identify the correct admissions and discharge dates for each episode. To keep it simple, here is a table of fake data that reflects a situation that I'm struggling with:
| Patient ID | Admitted Date | Discharge Date |
|---|---|---|
| 810 | 2020-12-15 | 2020-12-16 |
| 810 | 2021-06-17 | 2021-06-19 |
| 810 | 2021-06-19 | 2021-06-27 |
| 810 | 2021-06-27 | 2021-07-03 |
With this example dataframe, the first row shows a simple inpatient episode. Lines two through four have admission dates and discharge dates that are tied together. This is due to patients switching hospital divisions, initiating a new REV code.
I had originally used an ifelse statement that inherently failed. I used that without thinking of situations like this, where there are more two lines needed to be grouped as an episode.
Does anyone have any recommendations for packages/resources to use in order to turn lines 2-4 into a single row saying
| Patient ID | Admitted Date | Discharge Date |
|---|---|---|
| 810 | 2020-12-15 | 2020-12-16 |
| 810 | 2021-06-17 | 2021-07-03 |
Thanks! Let me know if any further explanation is needed.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
tribble(
~patient, ~admitted, ~discharge,
810, "2020-12-15", "2020-12-16",
810, "2021-06-17", "2021-06-19",
810, "2021-06-19", "2021-06-27",
810, "2021-06-27", "2021-07-03"
) |>
mutate(across(-patient, ymd),
group = if_else(discharge == lead(admitted) | admitted == lag(discharge), 1, 0)) |>
group_by(patient, group) |>
summarise(admitted = first(admitted), discharge = last(discharge)) |>
arrange(admitted)
#> `summarise()` has grouped output by 'patient'. You can override using the
#> `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups: patient [1]
#> patient group admitted discharge
#> <dbl> <dbl> <date> <date>
#> 1 810 NA 2020-12-15 2020-12-16
#> 2 810 1 2021-06-17 2021-07-03
Created on 2022-05-10 by the reprex package (v2.0.1)
This is exactly what I created the ivs package for. It allows you to work with interval vectors like what you have here. You can solve this using iv_groups(), which computes the non-overlapping "groups" of stay dates per patient.
library(dplyr)
library(ivs)
df <- tribble(
~Patient.ID, ~Admitted.Date, ~Discharge.Date,
810L, "2020-12-15", "2020-12-16",
810L, "2021-06-17", "2021-06-19",
810L, "2021-06-19", "2021-06-27",
810L, "2021-06-27", "2021-07-03"
)
df <- mutate(df, Admitted.Date = as.Date(Admitted.Date))
df <- mutate(df, Discharge.Date = as.Date(Discharge.Date))
df
#> # A tibble: 4 × 3
#> Patient.ID Admitted.Date Discharge.Date
#> <int> <date> <date>
#> 1 810 2020-12-15 2020-12-16
#> 2 810 2021-06-17 2021-06-19
#> 3 810 2021-06-19 2021-06-27
#> 4 810 2021-06-27 2021-07-03
# Create an interval vector combining the hospital stay as:
# [Admitted.Date, Discharge.Date)
df <- df %>%
mutate(Stay = iv(Admitted.Date, Discharge.Date), .keep = "unused")
df
#> # A tibble: 4 × 2
#> Patient.ID Stay
#> <int> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-06-19)
#> 3 810 [2021-06-19, 2021-06-27)
#> 4 810 [2021-06-27, 2021-07-03)
# Assuming you have multiple patients, we will group by `Patient.ID`.
# Then compute the non-overlapping interval "groups" per patient with `iv_groups()`
df %>%
group_by(Patient.ID) %>%
summarise(Stay = iv_groups(Stay), .groups = "drop")
#> # A tibble: 2 × 2
#> Patient.ID Stay
#> <int> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-07-03)
# You can also see which "group" each stay fell in by using `iv_identify_group()`
df %>%
group_by(Patient.ID) %>%
mutate(Group = iv_identify_group(Stay))
#> # A tibble: 4 × 3
#> # Groups: Patient.ID [1]
#> Patient.ID Stay Group
#> <int> <iv<date>> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16) [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-06-19) [2021-06-17, 2021-07-03)
#> 3 810 [2021-06-19, 2021-06-27) [2021-06-17, 2021-07-03)
#> 4 810 [2021-06-27, 2021-07-03) [2021-06-17, 2021-07-03)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With