Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent Python from caching the imported modules

While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.

The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.

Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?

I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...

The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.

A good alternative would be for IPython to restart, or restart the Python interpreter somehow.

So, if you develop in Python, what solution have you found to this problem?

Edit

To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?

More specifically, if I have my whole module in one file module.py then the following works fine:

import sys try:     del sys.modules['module'] except AttributeError:     pass import module  obj = module.my_class() 

This piece of code works beautifully and I can develop without quitting IPython for months.

However, whenever my module is made of several submodules, hell breaks loose:

import os for mod in ['module.submod1', 'module.submod2']:     try:         del sys.module[mod]     except AttributeError:         pass # sometimes this works, sometimes not. WHY? 

Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??

like image 486
Olivier Verdier Avatar asked May 27 '10 06:05

Olivier Verdier


People also ask

Does Python cache imports?

Python caches all imported modules This all happened because Python caches modules. In Python, every module that is imported is stored in a dictionary called sys.

Does Python use caching?

Python's functools module comes with the @lru_cache decorator, which gives you the ability to cache the result of your functions using the Least Recently Used (LRU) strategy. This is a simple yet powerful technique that you can use to leverage the power of caching in your code.


2 Answers

import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.

There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.

Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.

Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.

def reload_package(root_module):     package_name = root_module.__name__      # get a reference to each loaded module     loaded_package_modules = dict([         (key, value) for key, value in sys.modules.items()          if key.startswith(package_name) and isinstance(value, types.ModuleType)])      # delete references to these loaded modules from sys.modules     for key in loaded_package_modules:         del sys.modules[key]      # load each of the modules again;      # make old modules share state with new modules     for key in loaded_package_modules:         print 'loading %s' % key         newmodule = __import__(key)         oldmodule = loaded_package_modules[key]         oldmodule.__dict__.clear()         oldmodule.__dict__.update(newmodule.__dict__) 

Which I very briefly tested like so:

import email, email.mime, email.mime.application reload_package(email) 

printing:

reloading email.iterators reloading email.mime reloading email.quoprimime reloading email.encoders reloading email.errors reloading email reloading email.charset reloading email.mime.application reloading email._parseaddr reloading email.utils reloading email.mime.base reloading email.message reloading email.mime.nonmultipart reloading email.base64mime 
like image 109
Matt Anderson Avatar answered Sep 28 '22 20:09

Matt Anderson


Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.

like image 38
Mike Graham Avatar answered Sep 28 '22 18:09

Mike Graham