Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I refactor 100s of Class Methods in Python?

Tags:

python

I am working on some legacy code (created by someone in love with spaghetti code) that has over 150 getters and over 150 setters. The getters look like this:

def GetLoadFee(self):
    r_str = ""
    if len(self._LoadFee) > 20:
        r_str = self._LoadFee[:20]
    else:
        r_str = self._LoadFee.strip()
    return r_str.strip()

def GetCurrency(self):
    r_str = ""
    if len(self._Currency) > 3:
        r_str = self._Currency[:3]
    else:
        r_str = self._Currency.strip()
    return r_str.strip()

I would love to take the contents of each of these Getters and put them into a decorator /closure or some other method to make this code easier to maintain. The Setters are all one liners, so they're not as important. But they are basically all the same too. Any ideas to make this less painful?

NOTE: I still need the original Getter names as they are used in other programs as this nasty script is used in lots of other legacy code.

like image 326
Mike Driscoll Avatar asked Jan 11 '13 22:01

Mike Driscoll


3 Answers

def make_generic_getter(name, maxlen):
    def getter(self):
        value = getattr(self, name)
        r_str = ""
        if len(value) > maxlen:
            r_str = value[:maxlen]
        else:
            r_str = value.strip()
        return r_str.strip()
    return getter

Now, you can do this:

class Foo(object):
    def __init__(self):
        self._Bar = 'abc'
        self._Baz = 'def'
    GetBar = make_generic_getter('_Bar', 5)
    GetBaz = make_generic_getter('_Baz', 2)

Then:

>>> f = Foo()
>>> f.GetBar()
'abc'
>>> f.GetBaz()
'de'

Clearly, there's also a lot of repetitive and unnecessary stuff in the original function. (And it would be much better to use PEP8-style names for your properties.) But obviously it's much easier to refactor first, then improve, than the other way around. (In other words, start here, but don't stop here.)

From the comments:

How does the method maker get the "self" reference?

The method maker doesn't actually get the self reference. There is no self reference to get at the time the method maker is being called. But there also is no self reference to get for a normal method at the time the class is being defined. In either case, you're just defining a function that takes self as its first parameter, and it somehow magically gets the appropriate self when you call it.

To really understand how this actually works, you need to know about descriptors. See Implementing Descriptors and Invoking Descriptors (or the 3.3 version), read it over a few times, look at how the @property decorator is implemented, play around in the interactive interpreter, give up, go to sleep, and try again tomorrow, and it should all click. But it's easier if you learn the magic version first, so let's do that, using a simpler example:

>>> def func(self): pass
>>> class C(object):
...     def meth(self): pass
...     fake1 = func
>>> C.fake2 = func
>>> func, C.meth, C.fake1, C.fake2
(<function __main__.func>, <unbound method C.meth>, <unbound method C.func>, <unbound method C.func>)

An unbound method is just a thing with an im_class holding its class, an im_func holding a normal function, and an im_self holding None. And when you do fake1 = func in the class definition, or C.fake2 = func after the fact, you don't actually end up with func itself as the value of fake1 or fake2, but with an unbound method wrapped around func, its im_class pointing at C.

>>> c = C()
>>> c.meth, c.fake1
(<bound method C.meth of <__main__.C object at 0x111ebb0d0>>, <bound method C.meth of <__main__.C object at 0x111ebb0d0>>)

When you take an instance of a class, all of its unbound methods become bound methods. If you look at the bound methods' attributes, they're the same as the unbound methods, except that im_self is c instead of None. And when you call c.fake1(), that's how it works—Python sees that c.fake1 is a bound method, so, in effect, it calls c.fake1.im_func(c.fake1.im_self). And that's how fake gets its self parameter.

(This all becomes simpler in Python 3, because there's no such thing as unbound methods anymore, but I assume you care more about Python 2 given that you're dealing with a huge mess of legacy code.)

like image 149
abarnert Avatar answered Oct 20 '22 16:10

abarnert


You don't necessarily need to create getter/setter methods at class-creation time. You can also create callables as-needed:

class MyClass(object):
    # get/set properties for this class: {'Name':length}
    __properties = {'LoadFee':20, 'Currency':3}

    def __init__(self):
        self._Currency = '01 34'
        self._LoadFee = 'lorem ipsum dolor sit amet consecuti'

    def __getattr__(self, name):
        basename = name[3:]
        attrname = '_'+basename
        if basename in self.__properties:
            if name.startswith('Get'):
                return lambda : getattr(self, attrname)[:self.__properties[basename]].strip()
            elif name.startswith('Set'):
                return lambda value: setattr(self, attrname, value)
        raise AttributeError(name)

m = MyClass()

print m.GetCurrency()
print m.GetLoadFee()

Although this approach is easy to understand and doesn't use any metaprogramming voodoo, it's slow and defeats introspection.

You can speed this up by "reifying" methods as you call them, i.e., attaching an instancemethod to the class as attributes on instances of the class are accessed.

# MethodType is not actually necessary because
# everything it does can be done with normal Python
# but it will make our dynamic methods look as "normal"
# and not-dynamic as possible to introspection
from types import MethodType

class MyClass(object):
    # get/set properties for this class: {'Name':length}
    __properties = {'LoadFee':20, 'Currency':3}

    def __init__(self, **args):
        props = self.__properties
        emptystr = ''
        for k in props:
            setattr(self, '_'+k, args.get(k, emptystr))

    def __getattr__(self, name):
        print '__getattr__(%s)' % name
        # we can cache accesses by "reifying" our getters/setters as they are accessed
        cls = self.__class__
        basename = name[3:]
        attrname = '_'+basename
        # nested lambdas are just for delayed evaluation
        # they cache property size lookup--assumes __properties is class-constant!
        def getsize():
            return cls.__properties[basename]
        methodfactories = {
            'Get': lambda size: lambda self: getattr(self, attrname)[:size].strip(),
            'Set': lambda size: lambda self, value: setattr(self, attrname, value),
        }
        try:
            print '  creating', name
            callable = methodfactories[name[:3]](getsize())
        except (KeyError, AttributeError) as e:
            raise AttributeError("'{}' object has no attribute '{}'".format(cls.__name__, name))
        callable.__name__ = name #cosmetics
        unboundmethod = MethodType(callable, None, cls)
        setattr(cls, name, unboundmethod) # add unbound method to the class
        # magically get bound method on the instance!
        # this works because MethodType creates a descriptor that
        # returns a bound callable in an instance context
        # and an unbound one in a class context
        return getattr(self, name) # not an infinite loop!

If you then run the following code:

m = MyClass(Currency='01', LoadFee='lorem ipsum dolor sit')
n = MyClass(Currency='02', LoadFee='amet consecuti')
try:
    # AttributeError because it hasn't been used by an instance
    MyClass.GetCurrency 
except AttributeError, e:
    print ' 7:', e
print ' 8:', m.GetCurrency()
print ' 9:', MyClass.GetCurrency
print '10:', m.GetCurrency
print '11:', n.GetCurrency
print '12:', m.GetCurrency is n.GetCurrency
print '13:', n.GetCurrency()
print '14:', m.GetLoadFee()
print '15:', m.__dict__ # no per-instance callable!

You will get the following result:

 7: type object 'MyClass' has no attribute 'GetCurrency'
 8: __getattr__(GetCurrency)
  creating GetCurrency
01
 9: <unbound method MyClass.GetCurrency>
10: <bound method MyClass.GetCurrency of <__main__.MyClass object at 0x106f87b90>>
11: <bound method MyClass.GetCurrency of <__main__.MyClass object at 0x106f87f10>>
12: False
13: 02
14: __getattr__(GetLoadFee)
  creating GetLoadFee
lorem ipsum dolor si
15: {'_Currency': '01', '_LoadFee': 'lorem ipsum dolor sit'}

Notice that getattr is only called the first time any instance accesses a special property. After that a bound method is returned from the instancemethod we dynamically created and attached to the instance's class. After the first access of an attribute, the classes and instances will be nearly indistinguishable from a method we created the "normal" way and will have exactly the same runtime speed.

like image 3
Francis Avila Avatar answered Oct 20 '22 14:10

Francis Avila


You could try something like this:

def getter(attr, length):
    def wrapper(self):
        value = getattr(self, attr)
        return value[:length].strip()
    return wrapper

GetCurrency = getter("_Currency", 3)

Because you can't overshoot the end of a string when slicing, the length test is no longer necessary.

like image 1
BenTrofatter Avatar answered Oct 20 '22 15:10

BenTrofatter