I have a table like this:
>head(X)
column1 column2
sequence1 ATCGATCGATCG
sequence2 GCCATGCCATTG
I need an output in a fasta file, looking like this:
sequence1
ATCGATCGATCG
sequence2
GCCATGCCATTG
So, basically I need all entries of the 2nd column to become new rows, interspersing the first column. The old 2nd column can then be discarded.
The way I would normally do that is by replacing a whitespace (or tab) with \n in notepad++, but I fear my files will be too big for doing that.
Is there a way for doing that in R?
I had the same question but found a really easy way to convert a data frame to a fasta file using the package: "seqRFLP".
Do the following: Install and load seqRFLP
install.packages("seqRFLP")
library("seqRFLP")
Your sequences need to be in a data frame with sequence headers in column 1 and sequences in column 2 [doesn't matter if it's nucleotide or amino acid]
Here is a sample data frame
names <- c("seq1","seq2","seq3","seq4")
sequences<-c("EPTFYQNPQFSVTLDKR","SLLEDPCYIGLR","YEVLESVQNYDTGVAK","VLGALDLGDNYR")
df <- data.frame(names,sequences)
Then convert the data frame to .fasta format using the function: 'dataframe2fas'
df.fasta = dataframe2fas(df, file="df.fasta")
D <- do.call(rbind, lapply(seq(nrow(X)), function(i) t(X[i, ])))
D
# 1
# column1 "sequence1"
# column2 "ATCGATCGATCG"
# column1 "sequence2"
# column2 "GCCATGCCATTG"
Then, when you write to file, you could use
write.table(D, row.names = FALSE, col.names = FALSE, quote = FALSE)
# sequence1
# ATCGATCGATCG
# sequence2
# GCCATGCCATTG
so that the row names, column names, and quotes will be gone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With