renumber residues in a protein structure file (pdb)

Tags:

Hi
I am currently involved in making a website aimed at combining all papillomavirus information in a single place. As part of the effort we are curating all known files on public servers (e.g. genbank) One of the issues I ran into was that many (~50%) of all solved structures are not numbered according to the protein. I.e. a subdomain was crystallized (amino acid 310-450) however the crystallographer deposited this as residue 1-140. I was wondering whether anyone knows of a way to renumber the entire pdb file. I have found ways to renumber the sequence (identified by seqres), however this does not update the helix and sheet information. I would appreciate it if you had any suggestions…
Thanks

235

asked May 12 '11 19:05

Stylize

2 Answers

I'm the maintainer of pdb-tools - which may be a tool that can assist you.

I have recently modified the residue-renumber script within my application to provide more flexibility. It can now renumber hetatms and specific chains, and either force the residue numbers to be continuous or just add a user-specified offset to all residues.

Please let me know if this assists you.

113

answered Oct 18 '22 01:10

Mike Harms

I frequently encounter this problem too. After abandoning an old perl script I had for this I've been experimenting with some python instead. This solution assumes you've got Biopython, ProDy (http://www.csb.pitt.edu/ProDy/#prody) and EMBOSS (http://emboss.sourceforge.net/) installed.

I used one of the papillomavirus PDB entries here.

from Bio import AlignIO,SeqIO,ExPASy,SwissProt
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.Emboss.Applications import NeedleCommandline
from prody.proteins.pdbfile import parsePDB, writePDB
import os

oneletter = {
'ASP':'D','GLU':'E','ASN':'N','GLN':'Q',
'ARG':'R','LYS':'K','PRO':'P','GLY':'G',
'CYS':'C','THR':'T','SER':'S','MET':'M',
'TRP':'W','PHE':'F','TYR':'Y','HIS':'H',
'ALA':'A','VAL':'V','LEU':'L','ILE':'I',
}

# Retrieve pdb to extract sequence
# Can probably be done with Bio.PDB but being able to use the vmd-like selection algebra is nice
pdbname="2kpl"
selection="chain A"
structure=parsePDB(pdbname)
pdbseq_str=''.join([oneletter[i] for i in structure.select("protein and name CA and     %s"%selection).getResnames()])
alnPDBseq=SeqRecord(Seq(pdbseq_str,IUPAC.protein),id=pdbname)
SeqIO.write(alnPDBseq,"%s.fasta"%pdbname,"fasta")

# Retrieve reference sequence
accession="Q96QZ7"
handle = ExPASy.get_sprot_raw(accession)
swissseq = SwissProt.read(handle)
refseq=SeqRecord(Seq(swissseq.sequence,IUPAC.protein),id=accession)
SeqIO.write(refseq, "%s.fasta"%accession,"fasta")

# Do global alignment with needle from EMBOSS, stores entire sequences which makes numbering easier
needle_cli = NeedleCommandline(asequence="%s.fasta"%pdbname,bsequence="%s.fasta"%accession,gapopen=10,gapextend=0.5,outfile="needle.out")
needle_cli()
aln = AlignIO.read("needle.out", "emboss")
os.remove("needle.out")
os.remove("%s.fasta"%pdbname)
os.remove("%s.fasta"%accession)

alnPDBseq = aln[0]
alnREFseq = aln[1]
# Initialize per-letter annotation for pdb sequence record
alnPDBseq.letter_annotations["resnum"]=[None]*len(alnPDBseq)
# Initialize annotation for reference sequence, assume first residue is #1
alnREFseq.letter_annotations["resnum"]=range(1,len(alnREFseq)+1)

# Set new residue numbers in alnPDBseq based on alignment
reslist = [[i,alnREFseq.letter_annotations["resnum"][i]] for i in range(len(alnREFseq)) if alnPDBseq[i] != '-']
for [i,r] in reslist:
    alnPDBseq.letter_annotations["resnum"][i]=r

# Set new residue numbers in the structure
newresnums=[i for i in alnPDBseq.letter_annotations["resnum"][:] if i != None]
resindices=structure.select("protein and name CA and %s"%selection).getResindices()
resmatrix = [[newresnums[i],resindices[i]] for i in range(len(newresnums)) ]
for [newresnum,resindex] in resmatrix:  
    structure.select("resindex %d"%resindex).setResnums(newresnum)

writePDB("%s.renumbered.pdb"%pdbname,structure)

answered Oct 18 '22 00:10

Gordon Wells

Related questions
                            
                                Many-to-many declarative SQLAlchemy definition for users, groups, and roles
                            
                                Why is it not possible to get a Py_buffer from an array object?
                            
                                Grouping a series in Python
                            
                                Why does refs increase 2 for every new object in Python?
                            
                                How to color surface with stronger contrast
                            
                                Why is an instance of webapp.WSGIApplication always defined as a global variable in google app engine code?
                            
                                Where to Put Python Utils Folder?
                            
                                Using threading to keep FTP control port alive
                            
                                Python - Easiest way to scrape text from list of URLs using BeautifulSoup
                            
                                Could python have suffix-based number notation for engineering purposes?
                            
                                Python Raytracing
                            
                                Python: threading + lock slows my app down considerably
                            
                                Django Make ContentType Not Required
                            
                                exe error with cx_freeze
                            
                                is there a nice "python conventions and best practices" summary anywhere?
                            
                                How to add timeout to Deferred from Twisted's deferToThread API?
                            
                                How to use the pipe operator as part of a regular expression?
                            
                                How do you add a string to a deque without breaking the string up into characters?
                            
                                Optimize this python log-parsing code
                            
                                Problem Setting Up A Virtualenv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

renumber residues in a protein structure file (pdb)

Tags:

python

biopython

Stylize

People also ask

2 Answers

Mike Harms

Gordon Wells

Recent Activity

Donate For Us