Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strsplit one column with exact information into two column

Tags:

split

r

I have data looking like this:

    SNP Geno Allele
marker1   G1    AA
marker2   G1    TT
marker3   G1    TT
marker1   G2    CC
marker2   G2    AA
marker3   G2    TT
marker1   G3    GG
marker2   G3    AA
marker3   G3    TT

And I want it to look like this:

    SNP Geno Allele1 Allele2
marker1   G1       A       A
marker2   G1       T       T
marker3   G1       T       T
marker1   G2       C       C
marker2   G2       A       A
marker3   G2       T       T
marker1   G3       G       G
marker2   G3       A       A
marker3   G3       T       T

I am using this:

strsplit(Allele, split extended = TRUE)

But this is not working. Do I need additional commands?

like image 907
marie Avatar asked May 02 '12 21:05

marie


People also ask

How do I split one column into multiple columns in R?

To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.

How do you split a variable in R?

The split() function in R can be used to split data into groups based on factor levels. This function uses the following basic syntax: split(x, f, …)


1 Answers

Another approach, from start to finish:

Make reproducible data:

dat <- read.table(header = TRUE,  text = "SNP Geno    Allele
marker1 G1  AA
marker2 G1  TT
marker3 G1  TT
marker1 G2  CC
marker2 G2  AA
marker3 G2  TT
marker1 G3  GG
marker2 G3  AA
marker3 G3  TT")

UPDATED Extract the Allele column, split it into individual characters, then make those characters into two columns of a data frame:

EITHER

dat1 <- data.frame(t(matrix(
                     unlist(strsplit(as.vector(dat$Allele), split = "")), 
                     ncol = length(dat$Allele), nrow = 2)))

OR following @joran's suggestion

dat1 <- data.frame(do.call(rbind, strsplit(as.vector(dat$Allele), split = "")))

THEN

Add column names to the new columns:

names(dat1) <- c("Allele1", "Allele2")

Attach the two new columns to columns from the original data table, as @user1317221 suggests:

dat3 <- cbind(dat$SNP, dat$Geno, dat1)
        dat$SNP dat$Geno Allele1 Allele2
1 marker1       G1       A       A
2 marker2       G1       T       T
3 marker3       G1       T       T
4 marker1       G2       C       C
5 marker2       G2       A       A
6 marker3       G2       T       T
7 marker1       G3       G       G
8 marker2       G3       A       A
9 marker3       G3       T       T
like image 144
Ben Avatar answered Nov 14 '22 21:11

Ben