Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trade off between code duplication and performance

Python, being the dynamic language that it is, offer multiple ways to implement the same feature. These options may vary in readability, maintainability and performance. Even though the usual scripts that I write in Python are of a disposable nature, I now have a certain project that I am working on (academic) that must be readable, maintainable and perform reasonably well. Since I haven't done any serious coding in Python before, including any sort of profiling, I need help in deciding the balance between the three factors I mentioned above.

Here's a code snippet from one of the modules in a scientific package that I am working on. It is an n-ary Tree class with a very basic skeleton structure. This was written with inheritance and sub classing in mind.

Note : in the code below a tree is the same thing as a node. Every tree is an instance of the same class Tree.

class Tree(object):

    def __init__(self, parent=None, value=None):
        self.parent = parent
        self.value = value
        self.children = set()

The two functions below belongs to this class (along with many others)

    def isexternal(self):
        """Return True if this is an external tree."""
        return not bool(self.children)

    def isleaf(self):
        """Return True if this is a leaf tree."""
        return not bool(self.children)

Both these functions are doing exactly the same thing - they are just two different names. So, why not change it to something like:

    def isleaf(self):
        """Return True of this is a leaf tree."""
        return self.isexternal()

My doubts are these :

I've read that function calls in Python are rather expensive (creating new stacks for each call), but I don't know if it is a good or bad thing if one function depends on another. How will it affect maintainability. This happens many times in my code, where I call one method from another method to avoid code duplication. Is it bad practice to do this?

Here's another example of this code duplication scenario in the same class:

def isancestor(self, tree):
    """Return True if this tree is an ancestor of the specified tree."""
    return tree.parent is self or (not tree.isroot() 
        and self.isancestor(tree.parent))

def isdescendant(self, tree):
    """Return True if this tree is a descendant of the specified tree."""
    return self.parent is tree or (not self.isroot() 
        and self.parent.isdescendant(tree))

I could instead go for:

def isdescendant(self, tree):
    """Return True if this tree is a descendant of the specified tree."""
    return tree.isancestor(self)
like image 941
Renae Lider Avatar asked Jul 28 '15 22:07

Renae Lider


People also ask

Why do we avoid code duplication?

1. Duplicate code makes your program lengthy and bulky : Many programmers feel that if the software is working properly there is no reason to fix code duplications. You forget that you are just un-necessarily making your software bulky.

How do you reduce code duplication?

To avoid the problem of duplicated bugs, never reuse code by copying and pasting existing code fragments. Instead, put it in a method if it is not already in one, so that you can call it the second time that you need it.

Is it okay to duplicate code?

Duplication is bad, but… It isn't a question of whether you'll remember: it's a question of when you'll forget.” Which makes perfect sense. It's time well spent when you try to make your code streamlined and readable. You'll end up with a cleaner, easier-to-maintain, and more extensible code base as a result.


4 Answers

Very broadly speaking, there are two types of optimization: macro optimizations and micro optimizations. Macro optimizations include things like your choice of algorithms, deciding between different data structures, and the like. Things that can have a big impact on performance and often have large ripple effects on your code base if you change your mind. Switching from a data structure with linear O(n) to one with constant O(1) inserts could be a huge win and well worth the cost of doing it. Adding caching may change a dog slow algorithm into a lightning fast one.

Micro optimizations are things like eliding or inlining function calls, eliminating or adding variables, caching calculation results for a very short window, unrolling loops, etc. As a rule, you should forget about these types of optimizations and focus on the readability and maintainability of your code. The effects of micro optimizations are simply too small to be worth it.

You should only consider these types of changes after profiling your code. If you can identify a critical loop that would benefit from such an optimization, and your profiling confirms it would, and you make the change and verify the improvement worked with another round of profiling--then you should micro optimize.

But until then, don't sweat the small stuff.

def isdescendant(self, tree):
    """Return True if this tree is a descendant of the specified tree."""
    return tree.isancestor(self)

I would absolutely recommend this type of code reuse. It makes it crystal clear that isdescendant is the inverse of isancestor. It ensures that both functions work the same way so you can't inadvertantly introduce a bug in one but not the other.

def isleaf(self):
    """Return True of this is a leaf tree."""
    return self.isexternal()

Here I would ask myself if isleaf and isexternal are conceptually the same. Ignoring that they're implemented the same, are they logically identical? If so, I would have one call the other. If it's just happenstance that they have the same implementation, I might duplicate the code. Can you imagine a scenario where you would want to change one function but not the other? That would point towards duplication.

like image 141
John Kugelman Avatar answered Oct 04 '22 18:10

John Kugelman


This approach works well without introducing additional stack frames.

def isexternal(self):
    """Return True of this is an external tree."""
    return not bool(self.children)

isleaf = isexternal

In your second case, the algorithms are fundamentally different between the two methods. I think the solution you propose is fine.

like image 39
David K. Hess Avatar answered Oct 04 '22 19:10

David K. Hess


Just a small test:

>>> timeit('a()', setup="def a(): pass")
0.08267402648925781
>>> timeit('1+1')
0.03854799270629883

So, a simple function call has less than 2.5 times running time compared to a simple arithmetic expression. I don't think, it counts as "rather expensive".

like image 23
Klaus D. Avatar answered Oct 04 '22 20:10

Klaus D.


David Hess's answer is good...

Except, it's neither optimal nor canonical Python to say not bool(x).

not x gives exactly the same results, and is one global lookup and one function call cheaper.

Also, if you're using self.parent twice in the same call, you might consider whether you want to put it in a local -- parent = self.parent -- because local lookups are a lot cheaper than instance lookups. Of course, you should run timeit to make sure you're getting a benefit.

like image 20
Patrick Maupin Avatar answered Oct 04 '22 20:10

Patrick Maupin