I want to concatenate (paste
) rows of an elements in a data.frame based on indices. How can I do this efficiently? This is easy with a unique identifier but here I have indices not a factor to concatenate on. All package and base solution options welcomed.
indexes <- list(
2:3,
6:8,
11:12
)
dat <- data.frame(
x = c(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 2, 2, 1),
y = LETTERS[1:13],
z = "PP", stringsAsFactors = FALSE
)
x y z
1 1 A PP
2 2 B PP
3 2 C PP
4 3 D PP
5 4 E PP
6 5 F PP
7 5 G PP
8 5 H PP
9 6 I PP
10 7 J PP
11 2 K PP
12 2 L PP
13 1 M PP
Desired output:
x y z
1 1 A PP
2 2 B, C PP
3 3 D PP
4 4 E PP
5 5 F, G, H PP
6 6 I PP
7 7 J PP
8 2 K, L PP
9 1 M PP
Here's one possible solution in base R:
dat[sapply(indexes, "[", 1), "y"] <- sapply(indexes, function(i) {
paste(dat[["y"]][i], collapse = ", ")
})
dat[ setdiff( 1:nrow(dat),
setdiff(
unlist(indexes),
sapply(indexes, "[", 1)
)
), ]
x y z
1 1 A PP
2 2 B, C PP
4 3 D PP
5 4 E PP
6 5 F, G, H PP
9 6 I PP
10 7 J PP
11 2 K, L PP
13 1 M PP
Another base R
method:
indx <- !(1:nrow(dat) %in% unlist(lapply(indexes, '[', -1)))
transform(dat, y=ave(y, cumsum(indx), FUN=toString))[indx,]
# x y z
# 1 1 A PP
# 2 2 B, C PP
# 4 3 D PP
# 5 4 E PP
# 6 5 F, G, H PP
# 9 6 I PP
# 10 7 J PP
# 11 2 K, L PP
# 13 1 M PP
Explanation
Some insight into how !(1:nrow(dat) %in% unlist(lapply(indexes, '[', -1)))
came about:
I tried to find an index to group by. I started from the end and worked backwards. I knew that if I could get:
1 2 2 3 4 5 5 5 6 7 8 8 9
I could use ave
and run toString
. I thought what true and false combination would be needed to make cumsum
give the above index as output. I wrote that out:
cumsum(c(T, T, F, T, T, T, F, F, T, T, T, F, T))
[1] 1 2 2 3 4 5 5 5 6 7 8 8 9
I needed to find a way to create that logical index. If all of the indexes
list elements that aren't the first were false, I'd have the logical index I needed.
unlist(lapply(indexes, '[', -1))
[1] 3 7 8 12
You'll notice that those positions are all of the false values in the index.
dplyr
I guess it's only fair to add dplyr to the mix:
dat %>%
mutate(indx = na.omit(c(T, x != lead(x)))) %>%
group_by(ind2=cumsum(indx)) %>%
mutate(y=toString(y)) %>%
filter(indx)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With