I've got the following tiny Python method that is by far the performance hotspot (according to my profiler, >95% of execution time is spent here) in a much larger program:
def topScore(self, seq):
ret = -1e9999
logProbs = self.logProbs # save indirection
l = len(logProbs)
for i in xrange(len(seq) - l + 1):
score = 0.0
for j in xrange(l):
score += logProbs[j][seq[j + i]]
ret = max(ret, score)
return ret
The code is being run in the Jython implementation of Python, not CPython, if that matters. seq
is a DNA sequence string, on the order of 1,000 elements. logProbs
is a list of dictionaries, one for each position. The goal is to find the maximum score of any length l
(on the order of 10-20 elements) subsequence of seq
.
I realize all this looping is inefficient due to interpretation overhead and would be a heck of a lot faster in a statically compiled/JIT'd language. However, I'm not willing to switch languages. First, I need a JVM language for the libraries I'm using, and this kind of constrains my choices. Secondly, I don't want to translate this code wholesale into a lower-level JVM language. However, I'm willing to rewrite this hotspot in something else if necessary, though I have no clue how to interface it or what the overhead would be.
In addition to the single-threaded slowness of this method, I also can't get the program to scale much past 4 CPUs in terms of parallelization. Given that it spends almost all its time in the 10-line hotspot I've posted, I can't figure out what the bottleneck could be here.
if topScore
is called repeatedly for same seq
you could memoize
its value.
E.g. http://code.activestate.com/recipes/52201/
The reason it is slow is because it is O(N*N)
The maximum subsequence algorithm may help you improve this
What about precomputing xrange(l)
outside the for i loop?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With