Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python class methods: when is self not needed

Tags:

python

I'm trying to rewrite some code using classes. At some point what I want is assign a member function a particular definition using a parameter value for each instance of an object.

Coming from other languages (JavaScript, C++, Haskell, Fortran, ...) I am struggling to understand a few things on Python. One thing is the following distinction of self in class methods.

For instance, the following code obviously won't work:

class fdf:
    def f(x):
        return 666

class gdg(fdf):
    def sq():
        return 7*7

hg = gdg()
hf = fdf()
print(hf.f(),hg.f(),hg.sq())

which gives the error that "sq() takes 0 positional arguments but 1 was given".

The reason, as I understand it, is that at execution time the function is passed a reference to the calling object (the instance calling sq) as first argument before any other parameter/argument we may have defined/called sq with. So the solution is simple: change the code of sq to def sq(self):. Indeed, the Python tutorial 1 seems to suggest that object methods should always be defined with self as first parameter. Doing so we get as expected 666 666 49. So far so good.

However, when I try to implement my class like this:

class Activation:
    def nonLinearBipolarStep(self,x,string=None):
        if not string: return (-1 if x<0 else 1 )
        else: return ('-' if x<0 else '1')

    default='bipolar'
    activationFunctions = {
        'bipolar': nonLinearBipolarStep ,
    }
    def _getActivation(self,func=default):
        return self.activationFunctions.get(func,self.activationFunctions.get(self.default))

    def __init__(self,func=None):
        if func == None: func=self.default 
        self.run = self._getActivation(func)


ag = Activation()
print(ag.run(4))

I get the error

nonLinearBipolarStep() missing 1 required positional argument: 'x'

Yet, a workaround ( solution??) is defining the the step function without the parameter self (!) as

def nonLinearBipolarStep(x,string=None):

Then I get the expected behavior (at least for this trivial test) of 1. So, not only is self not needed here, but it even is incorrect an use here!

But according to the tutorial mentioned above, or to the answers in threads like this 2 or this 3, it seems to me this code shouldn't work...or should have some unexpected consequences at some point(?). Indeed, if I remove all references to self in the definition of _getActivation I get the error message _getActivation() takes from 0 to 1 positional arguments but 2 were given which I can understand according to that rule.

The thread "Why is self not used in this method" 4 does not provide a clear answer to me: What syntax detail of the code above tells me that self is not needed? For instance, how is that code different from this tutorial example

class MyClass:
    """A simple example class"""
    i = 12345

    def f(self):
        return 'hello world'

? Instantiating this class works as expected, but it complains about missing parameter (I know it could be any label) if defined with none.

This makes me question whether my code is not hiding a time bomb somehow: is self passed as the value for x? It works as expected so I'd say no, but then I'm facing this conundrum.

I guess I'm missing some key ideas of the language. I admit I also struggle with the question the OP of reference 3 is asking^.

[^]: In JS one just uses this in the function body, and the function itself is defined either as member of the object's prototype or as an instance member which then gets assigned correctly using...this.

EDIT: The thread is long. For those browsing for some help, if you are new to Python, then you may want to check the selected solution and its comments. However, if you already know about bound/unbound methods in Python, you just want to check directly the use of descriptor as described in Blckknght's answer. I finally opted for this way in my code using __get__ in the assignment to run.

like image 541
MASL Avatar asked Jun 24 '17 21:06

MASL


4 Answers

I think what has confused you here is that you're accessing the method via the class attribute activationFunctions, rather than (as an instance would normally be accessed) on the instance itself. For example, given:

class Class:

    def method(self, foo, bar):
        print(self, foo, bar)

    methods = {'method': method}

When we call the method directly from the dictionary:

>>> Class.methods['method'](1, 2, 3)
1 2 3

You can see that we're passing 1 as the self parameter; the method isn't being called on an instance, so no instance is being injected. By contrast, when we call it on an instance:

>>> instance = Class()
>>> instance.method(1, 2)
<__main__.Class object at 0x...> 1 2

Now our arguments are foo and bar, and the instance is self. That's why you think a different number of parameters seem to be required.


In this case, as you don't actually need the instance state in your method, just make it a regular function (note minor revisions for PEP-8 compliance):

def non_linear_bipolar_step(x, string=None):
    if string is not None: 
        return -1 if x < 0 else 1
    return '-' if x < 0 else '1'

class Activation:

    activation_functions = {
        'bipolar': non_linear_bipolar_step,
    }

    ...

This is likely less confusing.

like image 43
jonrsharpe Avatar answered Oct 16 '22 06:10

jonrsharpe


You're running into one of the more subtle parts of Python's method implementation. It comes down to how the self argument for normal method calls (e.g. some_instance.method()) is bound. It uses the "descriptor" protocol, which is not very well documented (at least, it's not made obvious to new Python programmers).

A descriptor is an object that has a __get__ method (and optionally __set__ and/or __delete__ method, but I'm only going to talk about __get__ here). When such an object is stored in a class variable, Python will call its __get__ method whenever the corresponding name is looked up on an instance. Note that this special behavior does not happen for descriptor objects stored in instance variables, only those that are class variables.

Functions are descriptors. That means that when you save a function as a class variable, its __get__ method will be called when you look it up on an instance. That method will return a "bound method" object which will pass along the self argument to the function automatically.

If you store a function somewhere other than a top-level class variable (such as in a dictionary or in an instance variable), you won't get this binding behavior, since the descriptor protocol won't be invoked when the object is looked up. This usually means you either need to pass self manually, or you should omit the self argument from the function definition in the first place (in which case I'd suggest moving the function out of the class to make it clear it's not intended to be used as a method).

But you can also construct bound methods by hand if you want to. The type is exposed in the types module, as types.MethodType. So you could change your code like this and it should work:

def __init__(self,func=None):
    if func == None: func=self.default 
    self.run = types.MethodType(self._getActivation(func), self) # be sure to import types
like image 80
Blckknght Avatar answered Oct 16 '22 06:10

Blckknght


What is self?

In Python, every normal method is forced to accept a parameter commonly named self. This is an instance of class - an object. This is how Python methods interact with a class's state.

You are allowed to rename this parameter whatever you please. but it will always have the same value:

>>> class Class:
    def method(foo): # 
        print(foo)

        
>>> cls = Class()
>>> cls.method()
<__main__.F object at 0x03E41D90>
>>> 

But then why does my example work?

However, what you are probably confused about is how this code works differently:

>>> class Class:
    def method(foo):
        print(foo)

    methods = {'method': method}
    
    def __init__(self):
        self.run = self.methods['method']

        
>>> cls = Class()
>>> cls.run(3)
3
>>> 

This is because of the distinction between bound, and unbound methods in Python.

When we do this in __init__():

self.run = self.methods['method']

We are referring to the unbound method method. That means that our reference to method is not bound to any specific instance of Class, and thus, Python will not force method to accept an object instance. because it does not have one to give.

The above code would be the same as doing this:

>>> class Class:
    def method(foo):
        print(foo)

        
>>> Class.method(3)
3
>>> 

In both examples, we are calling the method method of the class object Class , and not an instance of the Class object.

We can further see this distinction by examining the repr for a bound and unbound method:

>>> class Class:
    def method(foo):
        print(foo)

        
>>> Class.method
<function Class.method at 0x03E43D68>
>>> cls = Class()
>>> cls.method
<bound method Class.method of <__main__.Class object at 0x03BD2FB0>>
>>> 

As you can see, in the first example when we do Class.method, Python shows: <function Class.method at 0x03E43D68>. I've lied to you a little bit. When we have an unbound method of a class, Python treats them as plain functions. So method is simply a function that is not bound to any instance of `Class.

However in the second example, when we create an instance of Class, and then access the method object of it, we see printed: <bound method Class.method of <__main__.Class object at 0x03BD2FB0>>.

The key part to notice is bound method Class.method. That means method is **bound** to cls - a specfic an instance of Class.

General remarks

As @jonshapre mentioned, writing code like in your example leads to confusion (as proof by this question), and bugs. It would be a better idea if you simply defined nonLinearBipolarStep() outside of Activation, and reference that from inside of Activation.activation_functions:

def nonLinearBipolarStep(self,x,string=None):
        if not string: return (-1 if x<0 else 1 )
        else: return ('-' if x<0 else '1')

class Activation:

    activation_functions = {
        'bipolar': nonLinearBipolarStep,
    }

    ...

I guess a more specific question would be: what should I pay attention to on that code in order to become evident that ag.run(x) would be a call to an unbound function?

If you'd still like to let nonLinearBipolarStep be unbound, then I recommend simply being carefully. If you think your method would make for the cleanest code then go for it, but make sure you know what you are doing and the behavior your code will have.

If you still wanted to make is clear to users of your class that ag.run() would be static, you could document it in a docstring somewhere, but that is something the user really shouldn't even have to be concerned with at all.

like image 10
Christian Dean Avatar answered Oct 16 '22 06:10

Christian Dean


You're using unbound method (nonLinearBipolarStep) in this code:

activationFunctions = {
    'bipolar': nonLinearBipolarStep ,
}

Longer answer: methods are functions defined within class body and always take at least one argument, so called self (unless you use @staticfunction and turn them into normal functions). Self is an object of a given class, on which method is called (like this in C++). In python there's almost nothing special about this argument, it doesnt have to be named self. Now when you call unbound method, then first argument you've given will be interpreted as self and consumed. If you call bound methods, then this consumption doesnt happen (the method already has its self object). For example:

class A:
  def foo(self, x): print(x)
a = A()
a.foo(1) # a.foo is bound method, a is self, prints 1
A.foo(a, 2) # A.foo is unbound method, first argument becomes self, prints 2

UPDATE: Why it works at all. Short answer: because dot (.) operator will update unbound method to bound when it can.

Consider, what happens, when you write a.foo(1). First python check object a for foo and finds nothing (foo is not a value assigned to a). So it goes to a class (named A) and lo and behold - foo is there and is used. But here is a trick. Python will bind object a to unbound method A.foo (details escape me now, so imagine dragon did it) and make it into bound method. So a.foo is bound and doesnt need self anymore from arguments, thus 1 goes into argument x and everything works.

Now to your code: you use 'bipolar': nonLinearBipolarStep in map, which is unbound method. Then in constructor (init) you set self.run to value returned from _getActivation, which is taken from activationFunctions map. In given example you return nonLinearBipolarStep unbound method and assign it to self.run. Now you call ag.run. Going by the logic from the previous paragraph ag.run is first looked inside ag object. And here is your error - its found. As python found ag.run value inside ag object, it never consulted ag type (Activation) for run object and never had a chance to bind it. So ag.run is unbound method and expect self argument as first.

You've in general two options. Either do ag.run(ag, 4), which will work, but its ugly, or manually bind method to self in constructor. The latter you can do like this:

self.run = self._getActivation(func).__get__(self)
like image 1
Radosław Cybulski Avatar answered Oct 16 '22 07:10

Radosław Cybulski