Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load compiled python modules from memory?

Tags:

python

module

I need to read all modules (pre-compiled) from a zipfile (built by py2exe compressed) into memory and then load them all. I know this can be done by loading direct from the zipfile but I need to load them from memory. Any ideas? (I'm using python 2.5.2 on windows) TIA Steve

like image 348
Steve Avatar asked Dec 02 '09 04:12

Steve


People also ask

How do I load a Python module?

In Python, modules are accessed by using the import statement. When you do this, you execute the code of the module, keeping the scopes of the definitions so that your current file(s) can make use of these.

How do I import an entire module?

You need to use the import keyword along with the desired module name. When interpreter comes across an import statement, it imports the module to your current program. You can use the functions inside a module by using a dot(.) operator along with the module name.

Can I dynamically load a module in Python?

Using the imp module: Modules can be imported dynamically by the imp module in python. The example below is a demonstration on the using the imp module. It provides the find_module() method to find the module and the import_module() method to import it.

What are the three ways to import modules in Python?

So there's four different ways to import: Import the whole module using its original name: pycon import random. Import specific things from the module: pycon from random import choice, randint. Import the whole module and rename it, usually using a shorter variable name: pycon import pandas as pd.


2 Answers

It depends on what exactly you have as "the module (pre-compiled)". Let's assume it's exactly the contents of a .pyc file, e.g., ciao.pyc as built by:

$ cat>'ciao.py'
def ciao(): return 'Ciao!' 
$ python -c'import ciao; print ciao.ciao()'
Ciao!

IOW, having thus built ciao.pyc, say that you now do:

$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('ciao.pyc', 'rb').read()
>>> len(b)
200

and your goal is to go from that byte string b to an importable module ciao. Here's how:

>>> import marshal
>>> c = marshal.loads(b[8:])
>>> c
<code object <module> at 0x65188, file "ciao.py", line 1>

this is how you get the code object from the .pyc binary contents. Edit: if you're curious, the first 8 bytes are a "magic number" and a timestamp -- not needed here (unless you want to sanity-check them and raise exceptions if warranted, but that seems outside the scope of the question; marshal.loads will raise anyway if it detects a corrupt string).

Then:

>>> import types
>>> m = types.ModuleType('ciao')
>>> import sys
>>> sys.modules['ciao'] = m
>>> exec c in m.__dict__

i.e: make a new module object, install it in sys.modules, populate it by executing the code object in its __dict__. Edit: the order in which you do the sys.modules insertion and exec matters if and only if you may have circular imports -- but, this is the order Python's own import normally uses, so it's better to mimic it (which has no specific downsides).

You can "make a new module object" in several ways (e.g., from functions in standard library modules such as new and imp), but "call the type to get an instance" is the normal Python way these days, and the normal place to obtain the type from (unless it has a built-in name or you otherwise have it already handy) is from the standard library module types, so that's what I recommend.

Now, finally:

>>> import ciao
>>> ciao.ciao()
'Ciao!'
>>> 

...you can import the module and use its functions, classes, and so on. Other import (and from) statements will then find the module as sys.modules['ciao'], so you won't need to repeat this sequence of operations (indeed you don't need this last import statement here if all you want is to ensure the module is available for import from elsewhere -- I'm adding it only to show it works;-).

Edit: If you absolutely must import in this way packages and modules therefrom, rather than "plain modules" as I just showed, that's doable, too, but a bit more complicated. As this answer is already pretty long, and I hope you can simplify your life by sticking to plain modules for this purpose, I'm going to shirk that part of the answer;-).

Also note that this may or may not do what you want in cases of "loading the same module from memory multiple times" (this rebuilds the module each time; you might want to check sys.modules and just skip everything if the module's already there) and in particular when such repeated "load from memory" occurs from multiple threads (needing locks -- but, a better architecture is to have a single dedicated thread devoted to performing the task, with other modules communicating with it via a Queue).

Finally, there's no discussion of how to install this functionality as a transparent "import hook" which automagically gets involved in the mechanisms of the import statement internals themselves -- that's feasible, too, but not exactly what you're asking about, so here, too, I hope you can simplify your life by doing things the simple way instead, as this answer outlines.

like image 99
Alex Martelli Avatar answered Sep 20 '22 13:09

Alex Martelli


Compiled Python file consist of

  1. magic number (4 bytes) to determine type and version of Python,
  2. timestamp (4 bytes) to check whether we have newer source,
  3. marshaled code object.

To load module you have to create module object with imp.new_module(), execute unmashaled code in new module's namespace and put it in sys.modules. Below in sample implementation:

import sys, imp, marshal

def load_compiled_from_memory(name, filename, data, ispackage=False):
    if data[:4]!=imp.get_magic():
        raise ImportError('Bad magic number in %s' % filename)
    # Ignore timestamp in data[4:8]
    code = marshal.loads(data[8:])
    imp.acquire_lock() # Required in threaded applications
    try:
        mod = imp.new_module(name)
        sys.modules[name] = mod # To handle circular and submodule imports 
                                # it should come before exec.
        try:
            mod.__file__ = filename # Is not so important.
            # For package you have to set mod.__path__ here. 
            # Here I handle simple cases only.
            if ispackage:
                mod.__path__ = [name.replace('.', '/')]
            exec code in mod.__dict__
        except:
            del sys.modules[name]
            raise
    finally:
        imp.release_lock()
    return mod

Update: the code is updated to handle packages properly.

Note that you have to install import hook to handle imports inside loaded modules. One way to do this is adding your finder into sys.meta_path. See PEP302 for more information.

like image 32
Denis Otkidach Avatar answered Sep 23 '22 13:09

Denis Otkidach