The difference mainly arises with mutable vs immutable types.
__new__
accepts a type as the first argument, and (usually) returns a new instance of that type. Thus it is suitable for use with both mutable and immutable types.
__init__
accepts an instance as the first argument and modifies the attributes of that instance. This is inappropriate for an immutable type, as it would allow them to be modified after creation by calling obj.__init__(*args)
.
Compare the behaviour of tuple
and list
:
>>> x = (1, 2)
>>> x
(1, 2)
>>> x.__init__([3, 4])
>>> x # tuple.__init__ does nothing
(1, 2)
>>> y = [1, 2]
>>> y
[1, 2]
>>> y.__init__([3, 4])
>>> y # list.__init__ reinitialises the object
[3, 4]
As to why they're separate (aside from simple historical reasons): __new__
methods require a bunch of boilerplate to get right (the initial object creation, and then remembering to return the object at the end). __init__
methods, by contrast, are dead simple, since you just set whatever attributes you need to set.
Aside from __init__
methods being easier to write, and the mutable vs immutable distinction noted above, the separation can also be exploited to make calling the parent class __init__
in subclasses optional by setting up any absolutely required instance invariants in __new__
. This is generally a dubious practice though - it's usually clearer to just call the parent class __init__
methods as necessary.
There are probably other uses for __new__
but there's one really obvious one: You can't subclass an immutable type without using __new__
. So for example, say you wanted to create a subclass of tuple that can contain only integral values between 0 and size
.
class ModularTuple(tuple):
def __new__(cls, tup, size=100):
tup = (int(x) % size for x in tup)
return super(ModularTuple, cls).__new__(cls, tup)
You simply can't do this with __init__
-- if you tried to modify self
in __init__
, the interpreter would complain that you're trying to modify an immutable object.
__new__()
can return objects of types other than the class it's bound to. __init__()
only initializes an existing instance of the class.
>>> class C(object):
... def __new__(cls):
... return 5
...
>>> c = C()
>>> print type(c)
<type 'int'>
>>> print c
5
Not a complete answer but perhaps something that illustrates the difference.
__new__
will always get called when an object has to be created. There are some situations where __init__
will not get called. One example is when you unpickle objects from a pickle file, they will get allocated (__new__
) but not initialised (__init__
).
Just want to add a word about the intent (as opposed to the behavior) of defining __new__
versus __init__
.
I came across this question (among others) when I was trying to understand the best way to define a class factory. I realized that one of the ways in which __new__
is conceptually different from __init__
is the fact that the benefit of __new__
is exactly what was stated in the question:
So the only benefit of the __new__ method is that the instance variable will start out as an empty string, as opposed to NULL. But why is this ever useful, since if we cared about making sure our instance variables are initialized to some default value, we could have just done that in the __init__ method?
Considering the stated scenario, we care about the initial values of the instance variables when the instance is in reality a class itself. So, if we are dynamically creating a class object at runtime and we need to define/control something special about the subsequent instances of this class being created, we would define these conditions/properties in a __new__
method of a metaclass.
I was confused about this until I actually thought about the application of the concept rather than just the meaning of it. Here's an example that would hopefully make the difference clear:
a = Shape(sides=3, base=2, height=12)
b = Shape(sides=4, length=2)
print(a.area())
print(b.area())
# I want `a` and `b` to be an instances of either of 'Square' or 'Triangle'
# depending on number of sides and also the `.area()` method to do the right
# thing. How do I do that without creating a Shape class with all the
# methods having a bunch of `if`s ? Here is one possibility
class Shape:
def __new__(cls, sides, *args, **kwargs):
if sides == 3:
return Triangle(*args, **kwargs)
else:
return Square(*args, **kwargs)
class Triangle:
def __init__(self, base, height):
self.base = base
self.height = height
def area(self):
return (self.base * self.height) / 2
class Square:
def __init__(self, length):
self.length = length
def area(self):
return self.length*self.length
Note this is just an demonstartive example. There are multiple ways to get a solution without resorting to a class factory approach like above and even if we do choose to implelent the solution in this manner, there are a little caveats left out for sake of brevity (for instance, declaring the metaclass explicitly)
If you are creating a regular class (a.k.a a non-metaclass), then __new__
doesn't really make sense unless it is special case like the mutable versus immutable scenario in ncoghlan's answer answer (which is essentially a more specific example of the concept of defining the initial values/properties of the class/type being created via __new__
to be then initialized via __init__
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With