I have a data frame "DF" like this:
Flight.Start Flight.End Device Partner Creative Days.in.Flight
2015-08-31 2015-08-31 Standard MSN Video 35
What I need to do is "blow it up" like so:
Flight.Start Flight.End Date Device Partner Creative Days.in.Flight
2015-08-31 2015-10-04 2015-08-31 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-01 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-02 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-03 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-04 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-05 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-06 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-07 Standard MSN Video 35
ETC...... until Date variable hits 2015-10-04, then move on to next duplication
Essentially every row gets duplicated by the amount of days in flight - 1 (since the row that already exists can account for a single day in the interval, and then a new column "Date" is filled out for the relevant dates for within that flight. So if a row has a start and end date of 9/1 and 9/5 respectively, 4 duplicate rows would be appended to the one already existing, a new column would be created (Date), and the date sequence of whatever the flight start and end dates are for the original row will fill out the column values.
All date values are formatted as date, days in flight is a num, and the rest are factors.
EDIT
In response to the duplicate question flagging:
To clarify, this is NOT like the case that has been flagged as a duplicate, because my question is not really focused on how to duplicate based on days in flight (I already know how to do that!), but rather how I can then add column to that output data frame and sequentially insert dates within the corresponding flight period. Thanks for the heads up...
Here is one way with splitstackshape
and dplyr
. Using expandRows()
from the splitstackshape
package, you can expand your data frame as you described. Then, you want to add a sequence of dates using mutate()
. What I did was to group the data by the combination of Flight.Start
and Flight.End
, and use seq()
to create a sequence of date for each group. first()
is taking the first element of Flight.Start
and Flight.End
. In this way, you can create the sequence you want. I hope this will help you.
DATA and CODE
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
Device = "Standard",
Creative = "Video",
Days.in.Flight = c(3, 6),
stringsAsFactors = FALSE)
# Flight.Start Flight.End Device Creative Days.in.Flight
#1 2015-09-01 2015-09-03 Standard Video 3
#2 2015-09-10 2015-09-15 Standard Video 6
library(splitstackshape)
library(dplyr)
expandRows(mydf, "Days.in.Flight", drop = FALSE) %>%
group_by(Flight.Start, Flight.End) %>%
mutate(Date = seq(first(Flight.Start),
first(Flight.End),
by = 1))
# Flight.Start Flight.End Device Creative Days.in.Flight Date
# (date) (date) (chr) (chr) (dbl) (date)
#1 2015-09-01 2015-09-03 Standard Video 3 2015-09-01
#2 2015-09-01 2015-09-03 Standard Video 3 2015-09-02
#3 2015-09-01 2015-09-03 Standard Video 3 2015-09-03
#4 2015-09-10 2015-09-15 Standard Video 6 2015-09-10
#5 2015-09-10 2015-09-15 Standard Video 6 2015-09-11
#6 2015-09-10 2015-09-15 Standard Video 6 2015-09-12
#7 2015-09-10 2015-09-15 Standard Video 6 2015-09-13
#8 2015-09-10 2015-09-15 Standard Video 6 2015-09-14
#9 2015-09-10 2015-09-15 Standard Video 6 2015-09-15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With