Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group date variable when dates are close

Tags:

date

r

grouping

I am trying write a function or use cut to assign a grouping variable to some date data when those dates are close (user definition of close). For example, I would like to create a common grouping variable for some samples that were collected on consecutive dates. I was thinking cut would work here but then I realized cut doesn't group variables when they are close and rather creates a series of groups based on a sequence.

So take this dataframe for example:

df <- structure(list(Num = c(0.888401849195361, 0.185766335576773, 
0.493163562379777, 0.13070688676089, 0.484760325402021, 0.603240836178884, 
0.893201333936304, 0.641203448642045, 0.16957180458121, 0.0101411847863346
), Date = structure(c(10592, 10597, 10598, 10605, 10606, 10608, 
10609, 10616, 10617, 10618), class = "Date"), day = c(1L, 6L, 
7L, 14L, 15L, 17L, 18L, 25L, 26L, 27L)), .Names = c("Num", "Date", 
"day"), row.names = c(NA, -10L), class = "data.frame")

If was to apply a cut function as I understand its usage like this:

df$cutVar <- cut(df$day, breaks= seq(0, 31, by = 1), right=TRUE) 

I would be left with a range that went right through values that I'd prefer to be grouped together. For example, the 6th and 7th should be grouped together based on their proximity to each other. Similar to 14th and 15th and so on.

> df
          Num       Date day  cutVar
1  0.88840185 1999-01-01   1   (0,1]
2  0.18576634 1999-01-06   6   (5,6]
3  0.49316356 1999-01-07   7   (6,7]
4  0.13070689 1999-01-14  14 (13,14]
5  0.48476033 1999-01-15  15 (14,15]
6  0.60324084 1999-01-17  17 (16,17]
7  0.89320133 1999-01-18  18 (17,18]
8  0.64120345 1999-01-25  25 (24,25]
9  0.16957180 1999-01-26  26 (25,26]
10 0.01014118 1999-01-27  27 (26,27]

So the basic question here is how to group a continuous variable (a date in this instance) such that close (defined by the user) numbers are grouped together in a factor range?

like image 809
boshek Avatar asked Oct 28 '25 12:10

boshek


1 Answers

Is this something you'd like? where 3 is a threshold I chose for convenience. It can be any number you prefer:

df$group <- cumsum(c(1, diff.Date(df$Date)) >= 3)
df
          Num       Date day group
1  0.88840185 1999-01-01   1     0
2  0.18576634 1999-01-06   6     1
3  0.49316356 1999-01-07   7     1
4  0.13070689 1999-01-14  14     2
5  0.48476033 1999-01-15  15     2
6  0.60324084 1999-01-17  17     2
7  0.89320133 1999-01-18  18     2
8  0.64120345 1999-01-25  25     3
9  0.16957180 1999-01-26  26     3
10 0.01014118 1999-01-27  27     3
like image 164
Psidom Avatar answered Oct 31 '25 03:10

Psidom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!