Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I disable history in python mechanize module?

I have a web scraping script that gets new data once every minute, but over the course of a couple of days, the script ends up using 200mb or more of memory, and I found out it's because mechanize is keeping an infinite browser history for the .back() function to use.

I have looked in the docstrings, and I found the clear_history() function of the browser class, and I invoke that each time I refresh, but I still get 2-3mb higher memory usage on each page refresh. edit: Hmm, seems as if it kept doing the same thing after I called clear_history, up until I got to about 30mb worth of memory usage, then it cleared back down to 10mb or so (which is the base amount of memory my program starts up with)...any way to force this behavior on a more regular basis?

How do I keep mechanize from storing all of this info? I don't need to keep any of it. I'd like to keep my python script below 15mb memory usage.

like image 422
ThantiK Avatar asked Mar 06 '10 17:03

ThantiK


1 Answers

You can pass an argument history=whatever when you instantiate the Browser; the default value is None which means the browser actually instantiates the History class (to allow back and reload). The simplest approach (will give an attribute error exception if you ever do call back or reload):

class NoHistory(object):
  def add(self, *a, **k): pass
  def clear(self): pass

b = mechanize.Browser(history=NoHistory())

a cleaner approach would implement other methods in NoHistory to give clearer exceptions on erroneous use of the browser's back or reload, but this simple one should suffice otherwise.

Note that this is an elegant (though not well documented;-) use of the dependency injection design pattern: in a (bleah) "monkeypatching" world, the client code would be expected to overwrite b._history after the browser is instantiated, but with dependency injection you just pass in the "history" object you want to use. I've often maintained that Dependency Injection may be the most important DP that wasn't in the "gang of 4" book!-).

like image 82
Alex Martelli Avatar answered Oct 01 '22 05:10

Alex Martelli