I have a data frame like this:
id y1 y2 y3 y4
--+--+--+--+--
a |12|13|14|
b |12|18| |
c |13| | |
d |13|14|15|16
I want to reshape in such a way that I end with two columns. The above example would then become:
id from to
--+----+---
a |12 |13
a |13 |14
a |14 |
b |12 |18
b |18 |
c |13 |
d |13 |14
d |14 |15
d |15 |16
Each id
has a 'from' and a 'to' per pair of year values.
Does anybody know of an easy way to do this? I tried using reshape2
. I also looked at Combine Multiple Columns Into Tidy Data but I think my case is different.
A solution uses dplyr
and tidyr
. dt2
is the final output.
# Create example data frame
dt <- data.frame(id = c("a", "b", "c", "d"),
y1 = c(12, 12, 13, 13),
y2 = c(13, 18, NA, 14),
y3 = c(14, NA, NA, 15),
y4 = c(NA, NA, NA, 16),
stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt2 <- dt %>%
gather(STEP, from, -id) %>%
drop_na(from) %>%
arrange(id, STEP) %>%
group_by(id) %>%
mutate(to = lead(from)) %>%
select(-STEP)
You can use lapply
to loop over the pairs of columns and rbind
to union them:
do.call(rbind,
lapply(2:(length(df)-1),
function(x) setNames(df[!is.na(df[,x]),c(1,x,x+1)],
c("id", "from", "to"))))
id from to
1 a 12 13
2 b 12 18
3 c 13 NA
4 d 13 14
11 a 13 14
21 b 18 NA
41 d 14 15
12 a 14 NA
42 d 15 16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With