Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split columns into adjacent columns, use row name as new column name in R

I have a data frame containing two columns of identifying information and one column of letter pairs that are separated by hyphens:

df<-data.frame(
    list = rep(1:3, each = 2),
    set =  rep(c("A","B"), times = 3),
    item = c("ab-cd","ef-gh","ij-kl","mn-op","qr-st","uv-wx")  
    )

What I have been struggling to accomplish is a transformation of the data frame to the following form, in which: 1. The separate rows indexed by "list" are collapsed into one row each; 2. The "item" column is split into adjacent columns with the hyphen as delimiter; 3. The "set" column serving as the basis for naming the resulting columns.

df2 <- data.frame(
       list = c(1:3),
       A_1 = c("ab", "ij", "qr"),
       A_2 = c("cd", "kl", "st"),
       B_1 = c("ef", "mn", "uv"), 
       B_2 = c("gh", "op", "wx"))

I've referred to a number of previously posted questions (particularly [here]) and tried to approach the solution step-by-step using the BASE transpose function along with various binding functions, along with the reshape package and the splitstackshape package. None of my near-solutions are elegant, and I'm wondering if there is a good way of doing this.

As always, I'm incredibly indebted to the community here for advice and feedback.

like image 507
Steve'sConnect Avatar asked Mar 13 '23 14:03

Steve'sConnect


1 Answers

With many thanks to @AnandaMahto: it can be done in one line by first reshaping and then splitting instead of the other way around.

library(splitstackshape)
cSplit(dcast(as.data.table(df), list ~ set, value.var = "item"), c("A", "B"), "-")

We can accomplish in two steps using base R and reshape2.

First, we create columns '1' and '2' containing the split items. Normally, we wouldn't use numerical characters to start a column name with, but it saves us a step in renaming our result columns later.

df[,c("1","2")] <- do.call(rbind,strsplit(as.character(df$item),"-"))

Then we use recast:

res <- recast(data=df, list~set+variable, measure.var=c("1","2"))
res

  list A_1 A_2 B_1 B_2
1    1  ab  cd  ef  gh
2    2  ij  kl  mn  op
3    3  qr  st  uv  wx
like image 133
Heroka Avatar answered Apr 26 '23 13:04

Heroka