Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle multiple optimal edit paths implementing Needleman-Wunsche algorithm?

Trying to implement Needleman-Wunsche algorithm for biological sequences comparison. In some circumstances there exist multiple optimal edit paths.

  1. What is the common practice in bio-seq-compare tools handling this? Any priority/preferences among substitute/insert/deletion?

  2. If I want to keep multiple edit paths in memory, any data structure is recommended? Or generally, how to store paths with branches and merges?

Any comments appreciated.

like image 353
Frozen Flame Avatar asked Nov 24 '22 06:11

Frozen Flame


1 Answers

  1. If two paths are have identical scores, that means that the likelihood of them is the same no matter which kinds of operations they used. Priority for substitutions vs. insertions or deletions has already been handled in getting that score. So if two scores are the same, common practice is to break the tie arbitrarily.

  2. You should be able to handle this by recording all potential cells that you could have arrived at the current one from in your traceback matrix. Then, during traceback, start a separate branch whenever you come to a branching point. In order to allow for merges too, store some additional data about each cell (how will depend on what language you're using) indicating how many different paths left from it. Then, during traceback, wait at a given cell until that number of paths have arrived back at it, and then merge them into one. You can either be following the different branches with true parallel processing, or by just alternating which one you are advancing.

like image 132
seaotternerd Avatar answered Nov 25 '22 19:11

seaotternerd