Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split date periods by months and add rows in case of necessity

Tags:

date

r

I have a table:

Name| Start     | Finish    |
----|-----------|-----------|
A   |2015-01-22 |2015-02-04 |
B   |2015-01-02 |2015-01-10 |
A   |2015-01-22 |2015-02-14 |
B   |2015-01-02 |2015-02-10 |

I need to break periods by months. If a period starts in one month and ends in the next one then I need to split it into two periods. If a period starts and ends at the same month then it should be as it is. Let's assume period cannot contain more than one 1st day of the month. In other words, each line can be splitted for not more than two lines. Finish (end of the period) is always bigger than Start.

That's what I want to get:

Name| Start     | Finish    |
----|-----------|-----------|
A   |2015-01-22 |2015-01-31 |
A   |2015-02-01 |2015-02-04 |
A   |2015-01-22 |2015-01-31 |
A   |2015-02-01 |2015-02-14 |
B   |2015-01-02 |2015-01-10 |
B   |2015-01-02 |2015-01-31 |
B   |2015-02-01 |2015-02-10 |

The order of output rows isn't a matter.

Here is a code for the table:

Name = c("A", "B", "A", "B")
Start = c(as.Date("2015-01-22"), as.Date("2015-01-02"), as.Date("2015-01-22"), as.Date("2015-01-02"))
Finish = c(as.Date("2015-02-04"), as.Date("2015-01-10"), as.Date("2015-02-14"), as.Date("2015-02-10"))
df = data.frame(Name, Start, Finish)

Any suggestion how it can be done?

like image 282
Serhii Avatar asked Sep 12 '17 21:09

Serhii


3 Answers

The question has been changed. Originally the Name column uniquely identified the row but the changed version of the question no longer has that. The answer here has been modified accordingly so that now we identify rows by row number, i.e. 1:nrow(df), rather than df$Name in the second argument to by. Otherwise, code is unchanged.

Use by to split the data frame by row giving single rows and operating on each one with the anonymous function. It calculates the end-of-month (eom) for the Start and if the Finish is greater outputs a two-row data frame and otherwise returns the same data frame. Put it all together with rbind.

library(zoo)

do.call("rbind", by(df, 1:nrow(df), function(x) with(x, {
   eom <- as.Date(as.yearmon(Start), frac = 1)
   if (eom < Finish)
      data.frame(Name, Start = c(Start, eom+1), Finish = c(eom, Finish))
   else x
})))

giving:

    Name      Start     Finish
1.1    A 2015-01-22 2015-01-31
1.2    A 2015-02-01 2015-02-04
2      B 2015-01-02 2015-01-10
3.1    A 2015-01-22 2015-01-31
3.2    A 2015-02-01 2015-02-14
4.1    B 2015-01-02 2015-01-31
4.2    B 2015-02-01 2015-02-10
like image 163
G. Grothendieck Avatar answered Nov 08 '22 01:11

G. Grothendieck


Here's another approach in base R:

idx <- with(df, format(Finish, "%Y-%m") > format(Start, "%Y-%m"))

rbind(df[!idx,],
   transform(df[idx,], Finish = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))-1),
   transform(df[idx,], Start  = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))))

#   Name      Start     Finish
#2     B 2015-01-02 2015-01-10
#1     A 2015-01-22 2015-01-31
#3     A 2015-01-22 2015-01-31
#4     B 2015-01-02 2015-01-31
#11    A 2015-02-01 2015-02-04
#31    A 2015-02-01 2015-02-14
#41    B 2015-02-01 2015-02-10
like image 29
talat Avatar answered Nov 08 '22 01:11

talat


Edit:

This answers the original question:

require(dplyr)
require(zoo)

df %>%
  filter(Finish>as.Date(as.yearmon(Start),frac=1)) %>%
  group_by(Name) %>%
  do(rbind(.,c(.$Name,
               paste(as.Date(as.yearmon(.$Start),frac=1)+1),
               .$Finish))) %>%
  mutate(Finish:=ifelse(as.Date(as.yearmon(Start),frac=1)<Finish,
                        paste(as.Date(as.yearmon(Start),frac=1)),Finish))

Output:

   Name      Start     Finish
1     A 2015-01-22 2015-01-31
2     A 2015-02-01 2015-02-04
3     B 2015-03-02 2015-03-31
4     B 2015-04-01 2015-04-10

Sample data:

require(data.table)
df <- fread("Name Start      Finish    
             A    2015-01-22 2015-02-01 
             B    2015-03-02 2015-04-01")
like image 2
www Avatar answered Nov 08 '22 02:11

www