Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Twisted getPage(): process memory grow when requesting lot of pages

I am writing a script for contstant (each 30-120 sec) grabbing of information quering a large set of URLs (Icecast/Shoutcast servers status pages), about 500 urls. It works fine, but the python process resident size is constantly keep growing. I am sure it is endless grow as I left it running for several hours and it took 1.2Gb RES from initial 30Mb.

I simplified the script for easy understanding to the following:

from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.enterprise import adbapi

def ok(res, url):
    print "OK: " + str(url)
    reactor.callLater(30, load, url)

def error(res, url):
    print "FAIL: " + str(url)
    reactor.callLater(30, load, url)

def db_ok(res):
    for item in res:
        if item[1]:
            print "ADDED: " + str(item[1])
            reactor.callLater(30, load, item[1])

def db_error(res):
    print "Database error: " + str(res)
    reactor.stop()

def load(url):
    d = getPage(url,
                headers={"Accept": "text/html"},
                timeout=30)
    d.addCallback(ok, url)
    d.addErrback(error, url)


dbpool = adbapi.ConnectionPool("MySQLdb", "host", "user", "passwd", db="db")
q = dbpool.runQuery("SELECT id, url FROM stations")
q.addCallback(db_ok).addErrback(db_error)

reactor.run()

It grows the same as the original daemon and thus I localized the problem. I think it is related to twisted.web.client.getPage() somehow. In original daemon I have used twisted.manhole for heap evalutaions with meliae during run time, but do not see anything nasty.

First meliae dump made in right after only 1 or 2 query cycles are finished:

Total 84313 objects, 188 types, Total size = 15.9MiB (16647235 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0    5806   6   4142800  24  24  786712 dict
     1   28070  33   2223457  13  38    4874 str
     2     612   0   1636992   9  48    3424 HTTPClientFactory
     3   19599  23   1585720   9  57     608 tuple
     4     643   0    720160   4  61    1120 DelayedCall
     5     642   0    713904   4  66    1112 Client
     6     617   0    691040   4  70    1120 Connector
     7     639   0    577656   3  73     904 type
     8     691   0    556576   3  77    1120 Deferred
     9    3962   4    475440   2  80     120 function
    10    3857   4    462840   2  82     120 code
    11    3017   3    308192   1  84    4856 list
    12     240   0    266880   1  86    1112 Method
    13    2968   3    237440   1  87      80 instancemethod
    14     612   0    215424   1  88     352 InsensitiveDict
    15     217   0    211128   1  90   12624 module
    16    2185   2    157320   0  91      72 builtin_function_or_method
    17     107   0    119840   0  91    1120 HTTPPageGetter
    18     343   0    117992   0  92     344 IcecastRadioStation
    19     343   0    117992   0  93     344 HTTPExtractor

And top around that time:

VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
248m  27m 4152 R   92  1.6   0:09.21 python

Now we wait for a time and check again, this is the picture after 20 minutes running (around 40 query cycles):

Total 67428 objects, 188 types, Total size = 11.9MiB (12463799 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0    3865   5   3601624  28  28  786712 dict
     1   23762  35   2002029  16  44    4874 str
     2   16382  24   1346208  10  55     608 tuple
     3     644   0    582176   4  60     904 type
     4     174   0    554304   4  64    3424 HTTPClientFactory
     5     456   0    510720   4  68    1120 DelayedCall
     6    3963   5    475560   3  72     120 function
     7    3857   5    462840   3  76     120 code
     8     240   0    266880   2  78    1112 Method
     9     237   0    263544   2  80    1112 Client
    10     217   0    211128   1  82   12624 module
    11     187   0    209440   1  84    1120 Connector
    12     182   0    194624   1  85    1120 Deferred
    13    1648   2    179696   1  87    3768 list
    14    1530   2    122400   0  88      80 instancemethod
    15     343   0    117992   0  89     344 IcecastRadioStation
    16     343   0    117992   0  90     344 HTTPExtractor
    17    1175   1    103400   0  90      88 weakref
    18    1109   1     88720   0  91      80 wrapper_descriptor
    19      75   0     83400   0  92    1112 InterfaceClass

And top:

VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
532m 240m 4152 S   54 13.7   4:02.64 python

According to meliae neither objects count is growing nor total size. But the process eat up 200Mb of resident memory during this 20 minutes.

I also used valgrind on python, but no leaks are found. Any thoughts?

I am using Python version 2.6.6, twisted version 10.2.0


Update #1: I also used valgrind massif to profile CPython memory usage, here is allocation tree for 99.93% of memory allocated:
99.93% (578,647,287B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->94.69% (548,309,283B) 0x550819: O_cwrite (cStringIO.c:406)
| ->94.69% (548,309,283B) 0x55096F: O_write (cStringIO.c:436)
|   ->94.69% (548,309,283B) 0x5A17F9: PyCFunction_Call (methodobject.c:81)
|     ->94.69% (548,309,283B) 0x4D1373: call_function (ceval.c:3750)
|       ->94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412)
|         ->94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836)
|           ->94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771)
|             ->94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412)
|               ->94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836)
|                 ->94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771)
|                   ->94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412)
|                     ->94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836)
|                       ->94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771)
|                         ->94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx    (ceval.c:2412)
|                           ->94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836)
|                             ->94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771)
|                               ->94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412)
|                                 ->94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000)
|                                   ->94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524)
|                                     ->94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492)
|                                       ->94.69% (548,309,283B) 0x4D2870: ext_do_call (ceval.c:4063)
|                                         ->94.69% (548,309,283B) 0x4CC4E3: PyEval_EvalFrameEx (ceval.c:2452)
|                                           ->94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000)
|                                             ->94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524)
|                                               ->94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492)
|                                                 ->94.69% (548,309,283B) 0x4D2870: ext_do_call (ceval.c:4063)
|                                                   ->94.69% (548,309,283B) 0x4CC4E3: PyEval_EvalFrameEx (ceval.c:2452)
|                                                     ->94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000)
|                                                       ->94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524)
|                                                         ->94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492)
like image 409
Anton Martchukov Avatar asked Nov 15 '22 01:11

Anton Martchukov


1 Answers

My guess is that you're scheduling these page fetches based on a fixed timer, and you're not paying attention to where the fetches actually end. Pretend that each page takes 60 seconds to fetch. You have a huge pile of fetches scheduled for 30 seconds, and then again in 30 seconds, more and more piling up as you're completing earlier requests. This is just a guess, though, as even this simplified example isn't completely self-contained. (Can you reproduce it without having a database involved, with just a fixed list of URLs?)

That stack trace is not particularly helpful, either; effectively, it just says that the memory was allocated by calling a python function, which should be obvious. You might want to try a Python-specific memory profiler like Heapy or Dowser to see where your Python objects are going.

like image 51
Glyph Avatar answered Dec 06 '22 22:12

Glyph