This question is derived from here.
I have three large lists containing python objects (l1
, l2
and l3
). These lists are created when the program starts and they take total of 16GB of RAM. The program will be used on linux exclusively.
I do not need to modify these lists or the objects in these lists in any way or form after they are created. They must remain in memory until the program exits.
I am using os.fork() and multiprocessing module in my program to spawn multiple sub-processes (up to 20 currently). Each of these sub-processes needs to be able to read the three lists (l1
, l2
and l3
).
My program is otherwise working fine and quite fast. However i am having problems with memory consumption. I was hoping that each sub-process can use the three lists without copying them in memory due to the copy-on-write approach on Linux. However this is not the case as referencing any object in any of these lists will increase the associated ref counts and therefore causes the entire page of memory to be copied.
So my question would be:
Can i disable the reference counting on l1
, l2
and l3
and all of the objects in these lists? Basically making the entire object (including meta-data such as ref count) read-only, so that it will never be modified under any circumstances (this, i assume, would allow me to take advantage of copy-on-write).
Currently i fear that i am forced to move to another programming language to accomplish this task because of a "feature" (ref counting) that i do not need currently, but what is still forced upon me and causing unnecessary problems.
Every object in Python has a reference count and a pointer to a type. We can get the current reference count of an object with the sys module. You can use sys. getrefcount(object), but keep in mind that passing in the object to getrefcount() increases the reference count by 1.
Yes, there is a benefit of earlier collection with reference counting, but the main reason CPython uses it is historical. Originally there was no garbage collection for cyclic objects so cycles led to memory leaks. The C APIs and data structures are based heavily around the principle of reference counting.
If the reference count reaches zero, the object's type's deallocation function (which must not be NULL ) is invoked. This function is usually used to delete a strong reference before exiting its scope.
If an object's reference count reaches zero, the object has become inaccessible, and can be destroyed. When an object is destroyed, any objects referenced by that object also have their reference counts decreased.
You can't, reference counting is fundamental to CPython (the reference implementation, and the one you are using). Using methods on objects cause reference counts to change, item subscription or attribute access causes objects to be added and removed from the stack, which uses reference counts, etc. You cannot get around this.
And if the contents of the lists don't change, use tuple()
s instead. That won't change the fact that they'll be refcounted though.
Other implementations of Python (Jython (using the Java virtual machine), IronPython (a .NET runtime language) or PyPy (Python implemented in Python, but experimenting with JIT and other compiler techniques) are free to use different methods of memory management, and may or may not solve your memory problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With