Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.

I want an answer to this question:

Which lines get executed often (hot spots) and which lines are never used (dead code)?

Of course this must not slow down my production site.

I am not talking about measuring the coverage of tests.

like image 517
guettli Avatar asked May 22 '20 08:05

guettli


People also ask

How do we measure test coverage?

Calculating test coverage is actually fairly easy. You can simply take the number of lines that are covered by a test (any kind of test, across your whole testing strategy) and divide by the total number of lines in your application.

What is a good testing coverage?

With that being said it is generally accepted that 80% coverage is a good goal to aim for. Trying to reach a higher coverage might turn out to be costly, while not necessary producing enough benefit. The first time you run your coverage tool you might find that you have a fairly low percentage of coverage.

How test coverage is helpful in measuring the effectiveness of the testing?

Test Coverage is an important indicator of software quality and an essential part of software maintenance. It helps in evaluating the effectiveness of testing by providing data on different coverage items. It is a useful tool for finding untested parts of a code base.

What is code coverage which tool you use to validate code coverage?

Code coverage : indicates the percentage of code that is covered by the test cases through both manual testing and Selenium or any other test automation framework. For example, if your source code has a simple if…else loop, the code coverage would be 100% if your test code would cover both the scenarios i.e. if & else.


Video Answer


1 Answers

I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.

If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.

Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.

Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.

Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.

The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:

import collections
import signal

class Sampler(object):
    def __init__(self, interval=0.001):
        self.stack_counts = collections.defaultdict(int)
        self.interval = interval

    def start(self): 
        signal.signal(signal.VTALRM, self._sample)
        signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)

    def _sample(self, signum, frame):
        stack = []
        while frame is not None:
            formatted_frame = '{}({})'.format(
                frame.f_code.co_name,
                frame.f_globals.get('__name__'))
            stack.append(formatted_frame)
            frame = frame.f_back
        formatted_stack = ';'.join(reversed(stack))
        self.stack_counts[formatted_stack] += 1
        signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)

You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.

like image 125
Ronald Avatar answered Nov 15 '22 01:11

Ronald