Python memory management insights -- id()

Tags:

python

Playing around with id(). Began with looking at the addresses of identical attributes in non-identical objects. But that doesn't matter now, I guess. Down to the code:

class T(object):
    pass

class N(object):
    pass

First test (in interactive console):

n = N()
t = T()
id(n)
# prints 4298619728
id(t)
# prints 4298619792

No surprise here, actually. n.__class__ is different than t.__class__ so it seems obvious they can't possible be the same object. Is the __class__ the only difference between these objects at this time? Assuming no, as:

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

Or does Python simply create separate objects even if they are exactly the same, content-wise, instead of assigning the names n1, n2 to, at first, the same object (in memory) and re-assign when either n1 or n2 is modified? Why so? I understand this may be a question of convention, optimization, mood, low-level issues (don't spare me) but still, I'm curious.

Now, same classes as before, T() & N() -- executed one after another in the shell:

>>> id(N())
4298619728
>>> id(N())
4298619792
>>> id(N())
4298619728
>>> id(N())
4298619792

Why the juggling?

But here comes the weird part. Again, same classes, shell:

>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)

Not only the juggling stops, but N() and T() appear to be the same object. Since they cannot be, I understand this as whatever N() returns being destroyed after the id() call, before the end of the whole statement.

I realize this may be a tough one to answer. But I'm hoping someone could tell me what I'm observing here, whether my understanding is correct, share some dark magic about the inner workings and memory management of the interpreter or perhaps point to some good resources on this subject?

Thanks for your time on this one.

325

asked Jul 18 '11 13:07

maligree

2 Answers

You asked a lot of questions. I'll do my best to answer some of them, and hopefully you'll be able to figure out the rest (ask if you need help).

First question: explain behaviour of `id`

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

This shows that Python creates a new object each time you call an object constructor. This makes sense, because this is exactly what you asked for! If you wanted to allocate only one object, but give it two names, then you could have written this:

>>> n1 = N()
>>> n2 = n1
>>> id(n1) == id(n2)
True

Second question: why not copy-on-write?

You go on to ask why Python doesn't implement a copy-on-write strategy for object allocation. Well, the current strategy, of constructing an object every time you call a constructor, is:

simple to implement;
explicit (does exactly what you ask for);
easy to document and understand.

Also, the use cases for copy-on-write are not compelling. It only saves storage if many identical objects get created and are never modified. But in that case, why create many identical objects? Why not use a single object?

Third question: explain allocation behaviour

In CPython, the id of an object is (secretly!) its address in memory. See the function builtin_id in bltinmodule.c, line 907.

You can investigate Python's memory allocation behaviour by making a class with __init__ and __del__ methods:

class N:
    def __init__(self):
        print "Creating", id(self)
    def __del__(self):
        print "Destroying", id(self)

>>> id(N())
Creating 4300023352
Destroying 4300023352
4300023352

You can see that Python was able to destroy the object immediately, which allows it to reclaim the space for re-use by the next allocation. Python uses reference counting to keep track of how many references there are to each object, and when there are no more references to an object, it gets destroyed. Within the execution of the same statement, the same memory may get re-used several times. For example:

>>> id(N()), id(N()), id(N())
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
(4300023352, 4300023352, 4300023352)

Fourth question: explain the "juggling"

I am afraid I cannot reproduce the "juggling" behaviour you exhibit (where alternately created objects get different addresses). Can you give more details, such as Python version and operating system? What results do you get if you use my class N?

OK, I can reproduce the juggling if I make my class N inherit from object.

I have a theory about why this happens, but I have not checked it in a debugger, so please take it with a pinch of salt.

First, you need to understand a bit about how Python's memory manager works. Go read through obmalloc.c and come back when you're done. I'll wait.

...

All understood? Good. So now you know that Python manages small objects by sorting them into pools by size: each 4 KiB pool contains objects in a small range of sizes, and there's a free list to help the allocator to quickly find a slot for the next object to be allocated.

Now, the Python interactive shell is also creating objects: the abstract syntax tree and the compiled byte code, for example. My theory is that when N is a new-style class, it's size is such that it goes into the same pool as some other object that is allocated by the interactive shell. So the sequence of events looks something like this:

User enters id(N())
Python allocates a slot in pool P for the object just created (call this slot A).
Python destroys the object and returns its slot to the free list for pool P.
The interactive shell allocates some object, call it O. This happens to be the right size to go into pool P, so it gets slot A that was just freed.
User enters id(N()) again.
Python allocates a slot in pool P for the object just created. Slot A is full (still contains object O), so it gets slot B instead.
The interactive shell forgets about object O, so it gets destroyed, and slot A is returned to the free list for pool P.

You can see that this explains the alternating behaviour. In the case where the user types id(N()),id(N()), the interactive shell doesn't get a chance to stick its oar in between the two allocations, so they can both go in the same slot in the pool.

This also explains why it doesn't happen for old-style objects. Presumably the old-style objects are a different size, so they go in a different pool, and don't share slots with whatever objects the interactive shell is creating.

Fifth question: what objects might the interactive shell be allocating?

See pythonrun.c for the details, but basically the interactive shell:

Reads your input and allocates strings contains your code.
Calls the parser, which constructs an abstract syntax tree describing the code.
Calls the compiler, which constructs the compiled byte code.
Call the evaluator, which allocates objects for stack frames, locals, globals etc.

I don't know exactly which of these objects is to blame for the "juggling". Not the input strings (strings have their own specialized allocator); not the abstract syntax tree (it gets thrown away after it's been compiled). Maybe it's the byte code object.

111

answered Nov 03 '22 01:11

Gareth Rees

The documentation says it all:

id(object):

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Whenever you call a constructor, this creates a new object. The object has an id that's different from the id of any other object that's currently alive.

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

The "contents" of the two objects do not matter. They are two distinct entities; it seems perfectly logical that they would get different ids.

In CPython, ids are simply memory addresses. They do get recycled: if an object gets garbage collected, another object created at some point in the future might get the same id. This is the behaviour you're seeing in your repeated id(N()), id(T()) tests: since you're not keeping references to the newly created objects, the interpreter is free to garbage collect them and reuse their ids.

The recycling of ids is clearly an implementation/platform artefact and should not be relied upon.

answered Nov 03 '22 00:11

NPE

Related questions
                            
                                Python : Tkinter widget background (buttons, entries etc)
                            
                                Anonymous class inheritance
                            
                                How can I use readline() to begin from the second line?
                            
                                understanding for loops with reference to list containers in python
                            
                                Python: efficient counting number of unique values of a key in a list of dictionaries
                            
                                Fully parsable dictionary/thesaurus
                            
                                uwsgi + python + nginx + willy nilly file execution
                            
                                Applying a Regex to a Substring Without using String Slice
                            
                                Pipes and prompts in Python CLI scripts
                            
                                Does CouchDB have an equivalent to Redis' expire?
                            
                                reading a CSV files columns directly into variables names with python
                            
                                What is the best way to deal with import cycle in Python?
                            
                                Python: How to peek into a pty object to avoid blocking?
                            
                                square brackets after a function call
                            
                                Python PIL: How to save cropped image?
                            
                                WTForms - display property value instead of HTML field
                            
                                Python: lxml.etree.tostring(with_comments=False)
                            
                                How do I find the numbers that are not listed or missing?
                            
                                Naive and easiest way to decompose independent loop into parallel threads/processes
                            
                                Python AST: several semantics unclear, e.g. expr_context

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python memory management insights -- id()

Tags:

python

maligree

People also ask

2 Answers

First question: explain behaviour of `id`

Second question: why not copy-on-write?

Third question: explain allocation behaviour

Fourth question: explain the "juggling"

Fifth question: what objects might the interactive shell be allocating?

Gareth Rees

NPE

Recent Activity

Donate For Us

Python memory management insights -- id()

Tags:

python

maligree

People also ask

2 Answers

First question: explain behaviour of id

Second question: why not copy-on-write?

Third question: explain allocation behaviour

Fourth question: explain the "juggling"

Fifth question: what objects might the interactive shell be allocating?

Gareth Rees

NPE

Related questions

Recent Activity

Donate For Us

First question: explain behaviour of `id`