I have a python interpreter embedded inside an application. The application takes a long time to start up and I have no ability to restart the interpreter without restarting the whole application. What I would like to do is to essentially save the state of the interpreter and return to that state easily.
I started by storing the names of all modules in sys.modules that the python interpreter started with and then deleting all new modules from sys.modules when requested. This appears to make the interpreter prepared to re-import the same modules even though it has already imported them before. However, this doesn't seem to work in all situations, such as using singleton classes and static methods, etc.
I'd rather not embed another interpreter inside this interpreter if it can be avoided, as the ease of being able to use the applications API will be lost (as well as including a slight speed hit I imagine).
So, does anyone know of a way I could store the interpreter's state and then return to this so that it can cope with all situations?
Thanks,
Dan
storing the names of all modules in sys.modules that the python interpreter started with and then deleting all new modules from sys.modules when requested. This appears to make the interpreter prepared to re-import the same modules even though it has already imported them before.
The module-reload-forcing approach can be made to work in some circumstances but it's a bit hairy. In summary:
You need to make sure that all modules that have dependencies on each other are all reloaded at once. So any module 'x' that does 'import y' or 'from y import ...' must be deleted from sys.modules at the same time as module 'y'.
This process will need protecting with a lock if your app or any other active module is using threads.
Any module that leaves hooks pointing to itself in other modules cannot usefully be reloaded as references to the old module will remain in unreloaded/unreloadable code. This includes stuff like exception hooks, signals, warnings filters, encodings, monkey-patches and so on. If you start blithely reloading modules containing other people's code you might be surprised how often they do stuff like that, potentially resulting in subtle and curious errors.
So to get it to work you need to have well-defined boundaries between interdependent modules - "was it imported at initial start-up time" probably isn't quite good enough - and to make sure they're nicely encapsulated without unexpected dependencies like monkey-patching.
This can be based on folder, so for example anything in /home/me/myapp/lib could be reloaded as a unit, whilst leaving other modules alone - especially the contents of the stdlib in eg. /usr/lib/python2.x/ which is in general not reliable to reload. I've got code for this in an as-yet-unreleased webapp reloading wrapper, if you need.
Finally:
This is a nasty implementation detail which might change and break your app in some future Python version, but that is the price for playing with sys.modules in unsupported ways.
Try this code from ActiveState recipes: http://code.activestate.com/recipes/572213/
It extends pickle so it supports pickling anything defined in the shell console. Theoretically you should just be able to pickle the main module, according to their documentation:
import savestate, pickle, __main__
pickle.dump(__main__, open('savestate.pickle', 'wb'), 2)
I'd suggest tackling the root cause problem.
"The application takes a long time to start up and I have no ability to restart the interpreter without restarting the whole application"
I doubt this is actually 100% true. If the overall application is the result of an act of Congress, okay, it can't be changed. But if the overall application was written by real people, then finding and moving the code to restart the Python interpreter should be possible. It's cheaper, simpler and more reliable than anything else you might do to hack around the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With