Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python memory management insights -- id()

Tags:

python

Playing around with id(). Began with looking at the addresses of identical attributes in non-identical objects. But that doesn't matter now, I guess. Down to the code:

class T(object):
    pass

class N(object):
    pass

First test (in interactive console):

n = N()
t = T()
id(n)
# prints 4298619728
id(t)
# prints 4298619792

No surprise here, actually. n.__class__ is different than t.__class__ so it seems obvious they can't possible be the same object. Is the __class__ the only difference between these objects at this time? Assuming no, as:

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

Or does Python simply create separate objects even if they are exactly the same, content-wise, instead of assigning the names n1, n2 to, at first, the same object (in memory) and re-assign when either n1 or n2 is modified? Why so? I understand this may be a question of convention, optimization, mood, low-level issues (don't spare me) but still, I'm curious.

Now, same classes as before, T() & N() -- executed one after another in the shell:

>>> id(N())
4298619728
>>> id(N())
4298619792
>>> id(N())
4298619728
>>> id(N())
4298619792

Why the juggling?

But here comes the weird part. Again, same classes, shell:

>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)

Not only the juggling stops, but N() and T() appear to be the same object. Since they cannot be, I understand this as whatever N() returns being destroyed after the id() call, before the end of the whole statement.

I realize this may be a tough one to answer. But I'm hoping someone could tell me what I'm observing here, whether my understanding is correct, share some dark magic about the inner workings and memory management of the interpreter or perhaps point to some good resources on this subject?

Thanks for your time on this one.

like image 325
maligree Avatar asked Jul 18 '11 13:07

maligree


People also ask

How do I use Python memory management?

Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manager.

How do I check memory in Python?

You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.

How do I find the memory address of an object in Python?

We can get an address using the id() function. id() function gives the address of the particular object.

How do Python lists allocate memory?

Whenever additional elements are added to the list, Python dynamically allocates extra memory to accommodate future elements without resizing the container. This implies, adding a single element to an empty list will incite Python to allocate more memory than 8 bytes.


2 Answers

You asked a lot of questions. I'll do my best to answer some of them, and hopefully you'll be able to figure out the rest (ask if you need help).

First question: explain behaviour of id

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

This shows that Python creates a new object each time you call an object constructor. This makes sense, because this is exactly what you asked for! If you wanted to allocate only one object, but give it two names, then you could have written this:

>>> n1 = N()
>>> n2 = n1
>>> id(n1) == id(n2)
True

Second question: why not copy-on-write?

You go on to ask why Python doesn't implement a copy-on-write strategy for object allocation. Well, the current strategy, of constructing an object every time you call a constructor, is:

  1. simple to implement;
  2. explicit (does exactly what you ask for);
  3. easy to document and understand.

Also, the use cases for copy-on-write are not compelling. It only saves storage if many identical objects get created and are never modified. But in that case, why create many identical objects? Why not use a single object?

Third question: explain allocation behaviour

In CPython, the id of an object is (secretly!) its address in memory. See the function builtin_id in bltinmodule.c, line 907.

You can investigate Python's memory allocation behaviour by making a class with __init__ and __del__ methods:

class N:
    def __init__(self):
        print "Creating", id(self)
    def __del__(self):
        print "Destroying", id(self)

>>> id(N())
Creating 4300023352
Destroying 4300023352
4300023352

You can see that Python was able to destroy the object immediately, which allows it to reclaim the space for re-use by the next allocation. Python uses reference counting to keep track of how many references there are to each object, and when there are no more references to an object, it gets destroyed. Within the execution of the same statement, the same memory may get re-used several times. For example:

>>> id(N()), id(N()), id(N())
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
(4300023352, 4300023352, 4300023352)

Fourth question: explain the "juggling"

I am afraid I cannot reproduce the "juggling" behaviour you exhibit (where alternately created objects get different addresses). Can you give more details, such as Python version and operating system? What results do you get if you use my class N?

OK, I can reproduce the juggling if I make my class N inherit from object.

I have a theory about why this happens, but I have not checked it in a debugger, so please take it with a pinch of salt.

First, you need to understand a bit about how Python's memory manager works. Go read through obmalloc.c and come back when you're done. I'll wait.

...

All understood? Good. So now you know that Python manages small objects by sorting them into pools by size: each 4 KiB pool contains objects in a small range of sizes, and there's a free list to help the allocator to quickly find a slot for the next object to be allocated.

Now, the Python interactive shell is also creating objects: the abstract syntax tree and the compiled byte code, for example. My theory is that when N is a new-style class, it's size is such that it goes into the same pool as some other object that is allocated by the interactive shell. So the sequence of events looks something like this:

  1. User enters id(N())

  2. Python allocates a slot in pool P for the object just created (call this slot A).

  3. Python destroys the object and returns its slot to the free list for pool P.

  4. The interactive shell allocates some object, call it O. This happens to be the right size to go into pool P, so it gets slot A that was just freed.

  5. User enters id(N()) again.

  6. Python allocates a slot in pool P for the object just created. Slot A is full (still contains object O), so it gets slot B instead.

  7. The interactive shell forgets about object O, so it gets destroyed, and slot A is returned to the free list for pool P.

You can see that this explains the alternating behaviour. In the case where the user types id(N()),id(N()), the interactive shell doesn't get a chance to stick its oar in between the two allocations, so they can both go in the same slot in the pool.

This also explains why it doesn't happen for old-style objects. Presumably the old-style objects are a different size, so they go in a different pool, and don't share slots with whatever objects the interactive shell is creating.

Fifth question: what objects might the interactive shell be allocating?

See pythonrun.c for the details, but basically the interactive shell:

  1. Reads your input and allocates strings contains your code.

  2. Calls the parser, which constructs an abstract syntax tree describing the code.

  3. Calls the compiler, which constructs the compiled byte code.

  4. Call the evaluator, which allocates objects for stack frames, locals, globals etc.

I don't know exactly which of these objects is to blame for the "juggling". Not the input strings (strings have their own specialized allocator); not the abstract syntax tree (it gets thrown away after it's been compiled). Maybe it's the byte code object.

like image 111
Gareth Rees Avatar answered Nov 03 '22 01:11

Gareth Rees


The documentation says it all:

id(object):

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Whenever you call a constructor, this creates a new object. The object has an id that's different from the id of any other object that's currently alive.

>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False

The "contents" of the two objects do not matter. They are two distinct entities; it seems perfectly logical that they would get different ids.

In CPython, ids are simply memory addresses. They do get recycled: if an object gets garbage collected, another object created at some point in the future might get the same id. This is the behaviour you're seeing in your repeated id(N()), id(T()) tests: since you're not keeping references to the newly created objects, the interpreter is free to garbage collect them and reuse their ids.

The recycling of ids is clearly an implementation/platform artefact and should not be relied upon.

like image 36
NPE Avatar answered Nov 03 '22 00:11

NPE