I have a data frame containing two columns of identifying information and one column of letter pairs that are separated by hyphens:
df<-data.frame(
list = rep(1:3, each = 2),
set = rep(c("A","B"), times = 3),
item = c("ab-cd","ef-gh","ij-kl","mn-op","qr-st","uv-wx")
)
What I have been struggling to accomplish is a transformation of the data frame to the following form, in which: 1. The separate rows indexed by "list" are collapsed into one row each; 2. The "item" column is split into adjacent columns with the hyphen as delimiter; 3. The "set" column serving as the basis for naming the resulting columns.
df2 <- data.frame(
list = c(1:3),
A_1 = c("ab", "ij", "qr"),
A_2 = c("cd", "kl", "st"),
B_1 = c("ef", "mn", "uv"),
B_2 = c("gh", "op", "wx"))
I've referred to a number of previously posted questions (particularly [here]) and tried to approach the solution step-by-step using the BASE transpose function along with various binding functions, along with the reshape
package and the splitstackshape
package. None of my near-solutions are elegant, and I'm wondering if there is a good way of doing this.
As always, I'm incredibly indebted to the community here for advice and feedback.
With many thanks to @AnandaMahto: it can be done in one line by first reshaping and then splitting instead of the other way around.
library(splitstackshape)
cSplit(dcast(as.data.table(df), list ~ set, value.var = "item"), c("A", "B"), "-")
We can accomplish in two steps using base R and reshape2.
First, we create columns '1' and '2' containing the split items. Normally, we wouldn't use numerical characters to start a column name with, but it saves us a step in renaming our result columns later.
df[,c("1","2")] <- do.call(rbind,strsplit(as.character(df$item),"-"))
Then we use recast
:
res <- recast(data=df, list~set+variable, measure.var=c("1","2"))
res
list A_1 A_2 B_1 B_2
1 1 ab cd ef gh
2 2 ij kl mn op
3 3 qr st uv wx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With