I am trying to expand a series contained in part of a dataframe and repeat the values in the remaining columns to fill in the new rows. I have a grouping variable that I need to use, and then recombine to have a single dataframe again. Here's an example of what I mean, and how I'd do it piece-wise:
df <- data.frame("group" = c(rep("A",3),rep("B",3)), val=rep(c(1,3,5),2))
values <- data.frame(val=seq(1:5))
df2.a <- df[df$group=="A",]
df3.a <- right_join(df2.a, values, "val")
df3.a$group <- "A"
df2.b <- df[df$group=="B",]
df3.b <- right_join(df2.b, values, "val")
df3.b$group <- "B"
df4 <- rbind(df3.a, df3.b)
Here, df4 is my desired output.
But I'm sure I can be way more efficient using dplyr or some other split-apply-combine approach, though I'm clearly missing something.
Conceptually, this makes sense to me:
df.interp <- df %>%
group_by(group) %>%
full_join(x=., y=values, by="val") %>%
fill(group)
Though I can't complete the last line here because I can't edit the grouping variable. But if I ungroup, then I'm no longer operating on one group at a time, and I fill the new rows (which there aren't enough of) with the wrong value.
I'm sure I'm missing something simple here...what is it?
library(dplyr)
library(tidyr)
df %>%
group_by(group) %>%
complete(val = min(val):max(val))
# # A tibble: 10 x 2
# # Groups: group [2]
# group val
# <fct> <dbl>
# 1 A 1
# 2 A 2
# 3 A 3
# 4 A 4
# 5 A 5
# 6 B 1
# 7 B 2
# 8 B 3
# 9 B 4
# 10 B 5
Adding a data.table option.
Define a helper function
f <- function(x) {
tmp <- range(x)
tmp[1]:tmp[2]
}
Apply f by group
library(data.table)
out <- setDT(df)[, .(val = f(val)), by=group]
out
# group val
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 4
# 5: A 5
# 6: B 1
# 7: B 2
# 8: B 3
# 9: B 4
#10: B 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With