Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to add Ns to variable length sequences such that they all equal 150bp

Say I have a fasta containing 3 sequences...

ATTTTTGGA
AT
A

I want my sequence data to look like this:

ATTTTTGGA
ATTNNNNNN
ANNNNNNNN

Are there any programs or scripts that could accomplish this in a reasonable timeframe. I have thousands of sequences. Thanks!

I'm messing around and tried this, the file ended up blank but this is as far as I have gotten.

import sys
from Bio import SeqIO
from Bio.Seq import Seq
in_file = open(sys.argv[1],'r')
sequences = SeqIO.parse(in_file, "fasta")
output_in_file = open("test.fasta", "w")
for record in sequences:
    n = 150
    record.seq = record.seq + ("N" * n)
    seq = seq[:n]
output_in_file.close()
in_file.close()
like image 734
user3105519 Avatar asked Feb 24 '17 01:02

user3105519


1 Answers

Improving your code,

import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
    sequences = list(SeqIO.parse(in_file, "fasta"))
    n = max(map(len, sequences))   #find max len in sequences
    for record in sequences:
        record.seq = record.seq + ("N" * (n-len(record)))
    SeqIO.write(sequences, "test.fasta", "fasta")

you get, in test.fasta

>id_1
ATTTTTGGA
>id_2
ATNNNNNNN
>id_3
ANNNNNNNN

for "all equal 150bp"

import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
    sequences = list(SeqIO.parse(in_file, "fasta"))
    n = 150
    for record in sequences:
        record.seq = record.seq + ("N" * (n-len(record)))
    SeqIO.write(sequences, "test.fasta", "fasta")

you get,

>id_1
ATTTTTGGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>id_2
ATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>id_3
ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
like image 128
Jose Ricardo Bustos M. Avatar answered Oct 18 '22 00:10

Jose Ricardo Bustos M.