Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python memory consumption on Linux: physical and virtual memory are growing while the heap size remains the same

Tags:

I'm working on the some kind of a system service (actually it's just a log parser) written in Python. This program should work continuously for a long time (hopefully I mean days and weeks without failures and needs of restart). That's why I am concerned about memory consumption.

I put together different information about process memory usage from different sites into one simple function:

#!/usr/bin/env python
from pprint import pprint
from guppy import hpy
from datetime import datetime
import sys
import os
import resource
import re

def debug_memory_leak():
    #Getting virtual memory size 
    pid = os.getpid()
    with open(os.path.join("/proc", str(pid), "status")) as f:
        lines = f.readlines()
    _vmsize = [l for l in lines if l.startswith("VmSize")][0]
    vmsize = int(_vmsize.split()[1])

    #Getting physical memory size  
    pmsize = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    #Analyzing the dynamical memory segment - total number of objects in memory and heap size
    h = hpy().heap()
    if __debug__:
        print str(h)
    m = re.match(
        "Partition of a set of ([0-9]+) objects. Total size = ([0-9]+) bytes(.*)", str(h))
    objects = m.group(1)
    heap = int(m.group(2))/1024 #to Kb

    current_time = datetime.now().strftime("%H:%M:%S")
    data = (current_time, objects, heap, pmsize, vmsize)
    print("\t".join([str(d) for d in data]))

This function has been used to study the dynamics of the memory consumption of my long-playing process, and I still cannot explain its behavior. You can see that the heap size and total amount of the objects did not changed while the physical and virtual memory increased by 11% and 1% during these twenty minutes.

UPD: The process has been working for almost 15 hours by this moment. The heap is still the same, but the physical memory increased sixfold and the virtual memory increased by 50%. The curve is seem to be linear excepting the strange outliers at 3:00 AM:

Time Obj Heap PhM VM

19:04:19 31424 3928 5460 143732

19:04:29 30582 3704 10276 158240

19:04:39 30582 3704 10372 157772

19:04:50 30582 3709 10372 157772

19:05:00 30582 3704 10372 157772

(...)

19:25:00 30583 3704 11524 159900

09:53:23 30581 3704 62380 210756

I wonder what is going on with the address space of my process. The constant size of heap suggests that all of the dynamical objects are deallocated correctly. But I have no doubt that growing memory consumption will affect the sustainability of this life-critical process in the long run.

enter image description here

Could anyone clarify this issue please? Thank you.

(I use RHEL 6.4, kernel 2.6.32-358 with Python 2.6.6)

like image 550
Vitaly Isaev Avatar asked Apr 29 '14 16:04

Vitaly Isaev


People also ask

Does Python use virtual memory?

Virtual memory is a process-specific address space, essentially numbers from 0 to 2 64 -1 , where the process can read or write bytes. In a C program you might use APIs like malloc() or mmap() to do so; in Python you just create objects, and the Python interpreter will call malloc() or mmap() when necessary.

Does Python consume more memory?

Python optimizes memory utilization by allocating the same object reference to a new variable if the object already exists with the same value. That is why python is called more memory efficient.

Why does Python consume so much memory?

Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.

Does Linux use virtual memory?

What is virtual memory? Linux supports virtual memory, that is, using a disk as an extension of RAM so that the effective size of usable memory grows correspondingly. The kernel will write the contents of a currently unused block of memory to the hard disk so that the memory can be used for another purpose.


1 Answers

Without knowing what your program is doing, this might help.

I came across this article when working on a project a while back: http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html Which says, "Long running Python jobs that consume a lot of memory while running may not return that memory to the operating system until the process actually terminates, even if everything is garbage collected properly."

I ended up using the multiprocessing module to have my project fork a separate process and return when it needed to do work, and I haven't noticed any memory issues since.

That or try it in Python 3.3 http://bugs.python.org/issue11849

like image 143
user3588162 Avatar answered Sep 23 '22 06:09

user3588162