I am using python/pysam to do analyze sequencing data. In its tutorial (pysam - An interface for reading and writing SAM files) for the command mate it says:
'This method is too slow for high-throughput processing. If a read needs to be processed with its mate, work from a read name sorted file or, better, cache reads.'
How would you 'cache reads'?
Caching is a typical approach to speed up long running operations. It sacrifices memory for the sake of computational speed.
Let's suppose you have a function which given a set of parameters always returns the same result. Unfortunately this function is very slow and you need to call it a considerable amount of times slowing down your program.
What you could do, is storing a limited amount of {parameters: result} combinations and skip its logic any time the function is called with the same parameters.
It's a dirty trick but quite effective especially if the parameters combination is low compared to the function speed.
In Python 3 there's a decorator for this purpose.
In Python 2 a library can help but you need a bit more work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With