I have data looking like this:
SNP Geno Allele
marker1 G1 AA
marker2 G1 TT
marker3 G1 TT
marker1 G2 CC
marker2 G2 AA
marker3 G2 TT
marker1 G3 GG
marker2 G3 AA
marker3 G3 TT
And I want it to look like this:
SNP Geno Allele1 Allele2
marker1 G1 A A
marker2 G1 T T
marker3 G1 T T
marker1 G2 C C
marker2 G2 A A
marker3 G2 T T
marker1 G3 G G
marker2 G3 A A
marker3 G3 T T
I am using this:
strsplit(Allele, split extended = TRUE)
But this is not working. Do I need additional commands?
To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.
The split() function in R can be used to split data into groups based on factor levels. This function uses the following basic syntax: split(x, f, …)
Another approach, from start to finish:
Make reproducible data:
dat <- read.table(header = TRUE, text = "SNP Geno Allele
marker1 G1 AA
marker2 G1 TT
marker3 G1 TT
marker1 G2 CC
marker2 G2 AA
marker3 G2 TT
marker1 G3 GG
marker2 G3 AA
marker3 G3 TT")
UPDATED Extract the Allele column, split it into individual characters, then make those characters into two columns of a data frame:
EITHER
dat1 <- data.frame(t(matrix(
unlist(strsplit(as.vector(dat$Allele), split = "")),
ncol = length(dat$Allele), nrow = 2)))
OR following @joran's suggestion
dat1 <- data.frame(do.call(rbind, strsplit(as.vector(dat$Allele), split = "")))
THEN
Add column names to the new columns:
names(dat1) <- c("Allele1", "Allele2")
Attach the two new columns to columns from the original data table, as @user1317221 suggests:
dat3 <- cbind(dat$SNP, dat$Geno, dat1)
dat$SNP dat$Geno Allele1 Allele2
1 marker1 G1 A A
2 marker2 G1 T T
3 marker3 G1 T T
4 marker1 G2 C C
5 marker2 G2 A A
6 marker3 G2 T T
7 marker1 G3 G G
8 marker2 G3 A A
9 marker3 G3 T T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With