Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert table into fasta in R

Tags:

r

fasta

I have a table like this:

>head(X)
column1    column2
sequence1 ATCGATCGATCG
sequence2 GCCATGCCATTG

I need an output in a fasta file, looking like this:

sequence1  
ATCGATCGATCG
sequence2  
GCCATGCCATTG

So, basically I need all entries of the 2nd column to become new rows, interspersing the first column. The old 2nd column can then be discarded.

The way I would normally do that is by replacing a whitespace (or tab) with \n in notepad++, but I fear my files will be too big for doing that.

Is there a way for doing that in R?

like image 384
user3586764 Avatar asked Dec 14 '22 23:12

user3586764


2 Answers

I had the same question but found a really easy way to convert a data frame to a fasta file using the package: "seqRFLP".

Do the following: Install and load seqRFLP

install.packages("seqRFLP")
library("seqRFLP")

Your sequences need to be in a data frame with sequence headers in column 1 and sequences in column 2 [doesn't matter if it's nucleotide or amino acid]

Here is a sample data frame

names <- c("seq1","seq2","seq3","seq4")

sequences<-c("EPTFYQNPQFSVTLDKR","SLLEDPCYIGLR","YEVLESVQNYDTGVAK","VLGALDLGDNYR")

df <- data.frame(names,sequences)

Then convert the data frame to .fasta format using the function: 'dataframe2fas'

df.fasta = dataframe2fas(df, file="df.fasta")
like image 80
Steph Bannister Avatar answered Jan 09 '23 18:01

Steph Bannister


D <- do.call(rbind, lapply(seq(nrow(X)), function(i) t(X[i, ])))
D
#         1             
# column1 "sequence1"   
# column2 "ATCGATCGATCG"
# column1 "sequence2"   
# column2 "GCCATGCCATTG"

Then, when you write to file, you could use

write.table(D, row.names = FALSE, col.names = FALSE, quote = FALSE)
# sequence1
# ATCGATCGATCG
# sequence2
# GCCATGCCATTG

so that the row names, column names, and quotes will be gone.

like image 31
Rich Scriven Avatar answered Jan 09 '23 17:01

Rich Scriven