I want to extract the single letter amino acid sequence of specific chains from a bunch of PDB files.
I'm able to do it using SeqIO.parse() but it feels quite unpythonic in my opinion:
PDB_file_path = '/full/path/to/some/pdb'
# Is there a 1-liner for this ?
query_seqres = SeqIO.parse(PDB_file_path, 'pdb-seqres')
for chain in query_seqres:
if chain.id == query_chain_id:
query_chain = chain.seq
#
Is there a more concise and clearer way of doing this ?
Expanding on @BioGeek answer, here is the equivalent code to extract the sequence when using PDBParser.get_structure() instead of SeqIO.parse()
from Bio.PDB import PDBParser
from Bio.SeqUtils import seq1
pdbparser = PDBParser()
structure = pdbparser.get_structure(PDB_ID, PDB_file_path)
chains = {chain.id:seq1(''.join(residue.resname for residue in chain)) for chain in structure.get_chains()}
query_chain = chains[query_chain_id]
In my opinion it is not much more Pythonic, but you could use a dictionary comprehsion to turn the generator into an explict dict:
from Bio import SeqIO
PDB_file_path = '6q62.pdb'
query_chain_id = '6Q62:A'
chain = {record.id: record.seq for record in SeqIO.parse(PDB_file_path, 'pdb-seqres')}
query_chain = chain[query_chain_id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With