Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping dates in R to create patient episodes?

I am working with medical claims data to create inpatient episodes. There isn't an 'episode identifier' column within the dataset. My intention is to create a unique identifier for each episode to tie the claims to, which I can handle after I can properly identify the correct admissions and discharge dates for each episode. To keep it simple, here is a table of fake data that reflects a situation that I'm struggling with:

Patient ID Admitted Date Discharge Date
810 2020-12-15 2020-12-16
810 2021-06-17 2021-06-19
810 2021-06-19 2021-06-27
810 2021-06-27 2021-07-03

With this example dataframe, the first row shows a simple inpatient episode. Lines two through four have admission dates and discharge dates that are tied together. This is due to patients switching hospital divisions, initiating a new REV code.

I had originally used an ifelse statement that inherently failed. I used that without thinking of situations like this, where there are more two lines needed to be grouped as an episode.

Does anyone have any recommendations for packages/resources to use in order to turn lines 2-4 into a single row saying

Patient ID Admitted Date Discharge Date
810 2020-12-15 2020-12-16
810 2021-06-17 2021-07-03

Thanks! Let me know if any further explanation is needed.

like image 341
hamil_jp1 Avatar asked Dec 30 '25 06:12

hamil_jp1


2 Answers

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

tribble(
  ~patient, ~admitted, ~discharge,
  810, "2020-12-15", "2020-12-16",
  810, "2021-06-17", "2021-06-19",
  810, "2021-06-19", "2021-06-27",
  810, "2021-06-27", "2021-07-03"
) |>
  mutate(across(-patient, ymd),
         group = if_else(discharge == lead(admitted) | admitted == lag(discharge), 1, 0)) |> 
  group_by(patient, group) |>
  summarise(admitted = first(admitted), discharge = last(discharge)) |> 
  arrange(admitted)
#> `summarise()` has grouped output by 'patient'. You can override using the
#> `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups:   patient [1]
#>   patient group admitted   discharge 
#>     <dbl> <dbl> <date>     <date>    
#> 1     810    NA 2020-12-15 2020-12-16
#> 2     810     1 2021-06-17 2021-07-03

Created on 2022-05-10 by the reprex package (v2.0.1)

like image 76
Carl Avatar answered Jan 03 '26 14:01

Carl


This is exactly what I created the ivs package for. It allows you to work with interval vectors like what you have here. You can solve this using iv_groups(), which computes the non-overlapping "groups" of stay dates per patient.

library(dplyr)
library(ivs)

df <- tribble(
  ~Patient.ID, ~Admitted.Date, ~Discharge.Date,
  810L,   "2020-12-15",    "2020-12-16",
  810L,   "2021-06-17",    "2021-06-19",
  810L,   "2021-06-19",    "2021-06-27",
  810L,   "2021-06-27",    "2021-07-03"
)
df <- mutate(df, Admitted.Date = as.Date(Admitted.Date))
df <- mutate(df, Discharge.Date = as.Date(Discharge.Date))
df
#> # A tibble: 4 × 3
#>   Patient.ID Admitted.Date Discharge.Date
#>        <int> <date>        <date>        
#> 1        810 2020-12-15    2020-12-16    
#> 2        810 2021-06-17    2021-06-19    
#> 3        810 2021-06-19    2021-06-27    
#> 4        810 2021-06-27    2021-07-03

# Create an interval vector combining the hospital stay as:
# [Admitted.Date, Discharge.Date)
df <- df %>%
  mutate(Stay = iv(Admitted.Date, Discharge.Date), .keep = "unused")

df
#> # A tibble: 4 × 2
#>   Patient.ID                     Stay
#>        <int>               <iv<date>>
#> 1        810 [2020-12-15, 2020-12-16)
#> 2        810 [2021-06-17, 2021-06-19)
#> 3        810 [2021-06-19, 2021-06-27)
#> 4        810 [2021-06-27, 2021-07-03)

# Assuming you have multiple patients, we will group by `Patient.ID`.
# Then compute the non-overlapping interval "groups" per patient with `iv_groups()`
df %>%
  group_by(Patient.ID) %>%
  summarise(Stay = iv_groups(Stay), .groups = "drop")
#> # A tibble: 2 × 2
#>   Patient.ID                     Stay
#>        <int>               <iv<date>>
#> 1        810 [2020-12-15, 2020-12-16)
#> 2        810 [2021-06-17, 2021-07-03)

# You can also see which "group" each stay fell in by using `iv_identify_group()`
df %>%
  group_by(Patient.ID) %>%
  mutate(Group = iv_identify_group(Stay))
#> # A tibble: 4 × 3
#> # Groups:   Patient.ID [1]
#>   Patient.ID                     Stay                    Group
#>        <int>               <iv<date>>               <iv<date>>
#> 1        810 [2020-12-15, 2020-12-16) [2020-12-15, 2020-12-16)
#> 2        810 [2021-06-17, 2021-06-19) [2021-06-17, 2021-07-03)
#> 3        810 [2021-06-19, 2021-06-27) [2021-06-17, 2021-07-03)
#> 4        810 [2021-06-27, 2021-07-03) [2021-06-17, 2021-07-03)
like image 33
Davis Vaughan Avatar answered Jan 03 '26 14:01

Davis Vaughan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!