Biopython: is there a one-liner to extract the amino acid sequence of a specific chain from a PDB file?

Question

I want to extract the single letter amino acid sequence of specific chains from a bunch of PDB files.

I'm able to do it using SeqIO.parse() but it feels quite unpythonic in my opinion:

PDB_file_path = '/full/path/to/some/pdb' 

# Is there a 1-liner for this ?
query_seqres = SeqIO.parse(PDB_file_path, 'pdb-seqres')

for chain in query_seqres:
    if chain.id == query_chain_id:
        query_chain = chain.seq
#

Is there a more concise and clearer way of doing this ?

Gabriel Cia · Accepted Answer

Expanding on @BioGeek answer, here is the equivalent code to extract the sequence when using PDBParser.get_structure() instead of SeqIO.parse()

from Bio.PDB import PDBParser
from Bio.SeqUtils import seq1

pdbparser = PDBParser()

structure = pdbparser.get_structure(PDB_ID, PDB_file_path)
chains = {chain.id:seq1(''.join(residue.resname for residue in chain)) for chain in structure.get_chains()}

query_chain = chains[query_chain_id]

BioGeek · Answer

In my opinion it is not much more Pythonic, but you could use a dictionary comprehsion to turn the generator into an explict dict:

from Bio import SeqIO
PDB_file_path = '6q62.pdb' 
query_chain_id = '6Q62:A'

chain = {record.id: record.seq for record in SeqIO.parse(PDB_file_path, 'pdb-seqres')}
query_chain = chain[query_chain_id]

Biopython: is there a one-liner to extract the amino acid sequence of a specific chain from a PDB file?

Tags:

python

bioinformatics

biopython

Gabriel Cia

2 Answers

Gabriel Cia

BioGeek

Recent Activity

Donate For Us

Biopython: is there a one-liner to extract the amino acid sequence of a specific chain from a PDB file?

Tags:

python

bioinformatics

biopython

Gabriel Cia

2 Answers

Gabriel Cia

BioGeek

Related questions

Recent Activity

Donate For Us