Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of days in a date range in R

Tags:

r

I need help counting number of days in a given date range. Here is my data set:

dat<- data.frame(a=c(seq(as.Date("2018-01-01"), as.Date("2018-01-3"), 1),
                 seq(as.Date("2018-01-08"), as.Date("2018-01-10"), 1),
                 seq(as.Date("2018-01-23"), as.Date("2018-01-31"), 1),
                 seq(as.Date("2018-03-01"), as.Date("2018-03-05"), 1)), 
             b= c(rep("x",5), rep("y",5), rep("x",5), rep("y",5)) )



        a     b
1  2018-01-01 x
2  2018-01-02 x
3  2018-01-03 x
4  2018-01-08 x
5  2018-01-09 x
6  2018-01-10 y
7  2018-01-23 y
8  2018-01-24 y
9  2018-01-25 y
10 2018-01-26 y
11 2018-01-27 x
12 2018-01-28 x
13 2018-01-29 x
14 2018-01-30 x
15 2018-01-31 x
16 2018-03-01 y
17 2018-03-02 y
18 2018-03-03 y
19 2018-03-04 y
20 2018-03-05 y

This is reports received from a ship and "x" and "y" are different types of fuel. On 01,02 and 03 Jan, ship reported it was using "x" type fuel. Then ship didn't report anything on the 4th, 5th, 6th and 7th of Jan. Ship sent another report on 8th (which is a consolidation of the 4th, 5th, 6th, 7th and 8th of Jan) that it is still using fuel type "x". If ship changes its fuel type to "y", it will send out a report.

I want to count number of days when the fuel type is "x" and number of days when fuel type is "y". If there is a gap between the dates like for example

1  2018-01-01 x
2  2018-01-02 x
3  2018-01-03 x
4  2018-01-08 x
5  2018-01-09 x

then the number of days between the 1st and 5th row should be 8 days (09/Jan -01/Jan). So count of "x" is 8 days Then it should calculate the next counter in column b which is "y".

6  2018-01-10 y
7  2018-01-23 y
8  2018-01-24 y
9  2018-01-25 y
10 2018-01-26 y

Here diff in days is 16 days (26/Jan-10-Jan). So count of "y" is 16 days.

Then again we have "x":

11 2018-01-27 x
12 2018-01-28 x
13 2018-01-29 x
14 2018-01-30 x
15 2018-01-31 x

Here, count of "x" is 4 days (31/Jan-27/Jan). So total count of "x" is (8+4)= 12 days. And we count similarly of "y".

16 2018-03-01 y
17 2018-03-02 y
18 2018-03-03 y
19 2018-03-04 y
20 2018-03-05 y

Here is the catch. Ship didn't report anything in February. Since last report fuel type use was "x", which was reported on 31/Jan, it means that entire February, ship was using fuel type "x" and therefore we need to add 28 days of february to "x" which makes it (8+4+28)= 40 days

And "y" count is (16+4)=21 days

I can't seem to understand how to code the logic. Any help would be appreciated.

like image 718
Harkaran Saini Avatar asked Jan 01 '23 05:01

Harkaran Saini


1 Answers

data.table approach

library(data.table)
#create sample data
dt1 <- setDT(dat)
#create a data.table with one row for each day within the range of dt1
dt2 <- data.table( a = seq( min( dt1$a ), max( dt1$a), by = "days") )

#perform rolling join to get the last 'b' from dt1 on all dates in dt2
dt2[, b := dt1[dt2, b, on = "a", roll = TRUE]][]
#summarise by b (number of rows = number of days, so we can use .N)
dt2[, (days = .N), by = "b"]
#    b  N
# 1: x 42
# 2: y 22
like image 73
Wimpel Avatar answered Jan 11 '23 07:01

Wimpel