I have a table:
Name| Start | Finish |
----|-----------|-----------|
A |2015-01-22 |2015-02-04 |
B |2015-01-02 |2015-01-10 |
A |2015-01-22 |2015-02-14 |
B |2015-01-02 |2015-02-10 |
I need to break periods by months. If a period starts in one month and ends in the next one then I need to split it into two periods. If a period starts and ends at the same month then it should be as it is. Let's assume period cannot contain more than one 1st day of the month. In other words, each line can be splitted for not more than two lines. Finish (end of the period) is always bigger than Start.
That's what I want to get:
Name| Start | Finish |
----|-----------|-----------|
A |2015-01-22 |2015-01-31 |
A |2015-02-01 |2015-02-04 |
A |2015-01-22 |2015-01-31 |
A |2015-02-01 |2015-02-14 |
B |2015-01-02 |2015-01-10 |
B |2015-01-02 |2015-01-31 |
B |2015-02-01 |2015-02-10 |
The order of output rows isn't a matter.
Here is a code for the table:
Name = c("A", "B", "A", "B")
Start = c(as.Date("2015-01-22"), as.Date("2015-01-02"), as.Date("2015-01-22"), as.Date("2015-01-02"))
Finish = c(as.Date("2015-02-04"), as.Date("2015-01-10"), as.Date("2015-02-14"), as.Date("2015-02-10"))
df = data.frame(Name, Start, Finish)
Any suggestion how it can be done?
The question has been changed. Originally the Name
column uniquely identified the row but the changed version of the question no longer has that. The answer here has been modified accordingly so that now we identify rows by row number, i.e. 1:nrow(df)
, rather than df$Name
in the second argument to by
. Otherwise, code is unchanged.
Use by
to split the data frame by row giving single rows and operating on each one with the anonymous function. It calculates the end-of-month (eom
) for the Start and if the Finish is greater outputs a two-row data frame and otherwise returns the same data frame. Put it all together with rbind
.
library(zoo)
do.call("rbind", by(df, 1:nrow(df), function(x) with(x, {
eom <- as.Date(as.yearmon(Start), frac = 1)
if (eom < Finish)
data.frame(Name, Start = c(Start, eom+1), Finish = c(eom, Finish))
else x
})))
giving:
Name Start Finish
1.1 A 2015-01-22 2015-01-31
1.2 A 2015-02-01 2015-02-04
2 B 2015-01-02 2015-01-10
3.1 A 2015-01-22 2015-01-31
3.2 A 2015-02-01 2015-02-14
4.1 B 2015-01-02 2015-01-31
4.2 B 2015-02-01 2015-02-10
Here's another approach in base R:
idx <- with(df, format(Finish, "%Y-%m") > format(Start, "%Y-%m"))
rbind(df[!idx,],
transform(df[idx,], Finish = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))-1),
transform(df[idx,], Start = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))))
# Name Start Finish
#2 B 2015-01-02 2015-01-10
#1 A 2015-01-22 2015-01-31
#3 A 2015-01-22 2015-01-31
#4 B 2015-01-02 2015-01-31
#11 A 2015-02-01 2015-02-04
#31 A 2015-02-01 2015-02-14
#41 B 2015-02-01 2015-02-10
Edit:
This answers the original question:
require(dplyr)
require(zoo)
df %>%
filter(Finish>as.Date(as.yearmon(Start),frac=1)) %>%
group_by(Name) %>%
do(rbind(.,c(.$Name,
paste(as.Date(as.yearmon(.$Start),frac=1)+1),
.$Finish))) %>%
mutate(Finish:=ifelse(as.Date(as.yearmon(Start),frac=1)<Finish,
paste(as.Date(as.yearmon(Start),frac=1)),Finish))
Output:
Name Start Finish
1 A 2015-01-22 2015-01-31
2 A 2015-02-01 2015-02-04
3 B 2015-03-02 2015-03-31
4 B 2015-04-01 2015-04-10
Sample data:
require(data.table)
df <- fread("Name Start Finish
A 2015-01-22 2015-02-01
B 2015-03-02 2015-04-01")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With