My dataset looks like this:
unique.id abx.1 start.1 stop.1 abx.2 start.2 stop.2 abx.3 start.3 stop.3 abx.4 start.4
1 1 Moxi 2014-01-01 2014-01-07 PenG 2014-01-01 2014-01-07 Vanco 2014-01-01 2014-01-07 Moxi 2014-01-01
2 2 Moxi 2014-01-01 2014-01-02 Cipro 2014-01-01 2014-01-02 PenG 2014-01-01 2014-01-02 Vanco 2014-01-01
3 3 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01 2014-01-05 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01
4 4 Vanco 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03 PenG 2014-01-02
5 5 Vanco 2014-01-01 2014-01-02 PenG 2014-01-01 2014-01-02 PenG 2014-01-01 2014-01-02 Cipro 2014-01-01
stop.4 intervention
1 2014-01-07 0
2 2014-01-02 0
3 2014-01-05 1
4 2014-01-03 1
5 2014-01-02 0
With some code to create this:
abxoptions <- c("Cipro", "Moxi", "PenG", "Vanco")
df3 <- data.frame(
unique.id = 1:5,
abx.1 = sample(abxoptions,5, replace=TRUE),
start.1 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
stop.1 = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
abx.2 = sample(abxoptions,5, replace=TRUE),
start.2 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
stop.2 = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
abx.3 = sample(abxoptions,5, replace=TRUE),
start.3 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
stop.3 = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
abx.4 = sample(abxoptions,5, replace=TRUE),
start.4 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
stop.4 = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
intervention = c(0,0,1,1,0)
)
I would like to tidy this data to look like this:
unique.id abx start stop intervention
1 Moxi 2014-01-10 2014-01-07 0
1 Pen G 2014-01-01 2014-01-07 0
1 Vanco 2014-01-01 2014-01-07 0
1 Moxi 2014-01-01 2014-01-07 0 etc etc
The following solutions didn't get me where I needed: Gather multiple sets of columns and Combining multiple columns into one
I suspect that Hadley's amazing tidyr pakcage is the way to go...just can't figure this out. Any help would be greatly appreciated.
Almost every data tidying problem can be solved in three steps:
(often you'll only need one or two of these, but I think they're almost always in this order).
For your data:
unique.id
This looks like:
library(tidyr)
library(dplyr)
df3 %>%
gather(col, value, -unique.id, -intervention) %>%
separate(col, c("variable", "number")) %>%
spread(variable, value, convert = TRUE) %>%
mutate(start = as.Date(start, "1970-01-01"), stop = as.Date(stop, "1970-01-01"))
Your case is a bit more complicated because you have two types of variables, so you need to restore the types at the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With