Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Biopython: is there a one-liner to extract the amino acid sequence of a specific chain from a PDB file?

I want to extract the single letter amino acid sequence of specific chains from a bunch of PDB files.

I'm able to do it using SeqIO.parse() but it feels quite unpythonic in my opinion:

PDB_file_path = '/full/path/to/some/pdb' 

# Is there a 1-liner for this ?
query_seqres = SeqIO.parse(PDB_file_path, 'pdb-seqres')

for chain in query_seqres:
    if chain.id == query_chain_id:
        query_chain = chain.seq
#

Is there a more concise and clearer way of doing this ?

like image 307
Gabriel Cia Avatar asked Oct 24 '25 03:10

Gabriel Cia


2 Answers

Expanding on @BioGeek answer, here is the equivalent code to extract the sequence when using PDBParser.get_structure() instead of SeqIO.parse()

from Bio.PDB import PDBParser
from Bio.SeqUtils import seq1

pdbparser = PDBParser()

structure = pdbparser.get_structure(PDB_ID, PDB_file_path)
chains = {chain.id:seq1(''.join(residue.resname for residue in chain)) for chain in structure.get_chains()}

query_chain = chains[query_chain_id]
like image 172
Gabriel Cia Avatar answered Oct 27 '25 00:10

Gabriel Cia


In my opinion it is not much more Pythonic, but you could use a dictionary comprehsion to turn the generator into an explict dict:

from Bio import SeqIO
PDB_file_path = '6q62.pdb' 
query_chain_id = '6Q62:A'

chain = {record.id: record.seq for record in SeqIO.parse(PDB_file_path, 'pdb-seqres')}
query_chain = chain[query_chain_id]
like image 23
BioGeek Avatar answered Oct 27 '25 01:10

BioGeek