Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python how can one tell if a module comes from a C extension?

What is the correct or most robust way to tell from Python if an imported module comes from a C extension as opposed to a pure Python module? This is useful, for example, if a Python package has a module with both a pure Python implementation and a C implementation, and you want to be able to tell at runtime which one is being used.

One idea is to examine the file extension of module.__file__, but I'm not sure all the file extensions one should check for and if this approach is necessarily the most reliable.

like image 823
cjerdonek Avatar asked Dec 02 '13 22:12

cjerdonek


People also ask

How can I tell where a Python module is imported?

For a pure Python module, we can locate its source by module_name. __file__. This will return the location where the module's . py file exists.

What are C extensions in Python?

Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in .

What is the extension of module in Python?

In Python, Modules are simply files with the “. py” extension containing Python code that can be imported inside another Python Program. In simple terms, we can consider a module to be the same as a code library or a file that contains a set of functions that you want to include in your application.


1 Answers

tl;dr

See the "In Search of Perfection" subsection below for the well-tested answer.

As a pragmatic counterpoint to abarnert's helpful analysis of the subtlety involved in portably identifying C extensions, Stack Overflow Productions™ presents... an actual answer.

The capacity to reliably differentiate C extensions from non-C extensions is incredibly useful, without which the Python community would be impoverished. Real-world use cases include:

  • Application freezing, converting one cross-platform Python codebase into multiple platform-specific executables. PyInstaller is the standard example here. Identifying C extensions is critical to robust freezing. If a module imported by the codebase being frozen is a C extension, all external shared libraries transitively linked to by that C extension must be frozen with that codebase as well. Shameful confession: I contribute to PyInstaller.
  • Application optimization, either statically to native machine code (e.g., Cython) or dynamically in a just-in-time manner (e.g., Numba). For self-evident reasons, Python optimizers necessarily differentiate already compiled C extensions from uncompiled pure-Python modules.
  • Dependency analysis, inspecting external shared libraries on behalf of end users. In our case, we analyze a mandatory dependency (Numpy) to detect local installations of this dependency linking against non-parallelized shared libraries (e.g., the reference BLAS implementation) and inform end users when this is the case. Why? Because we don't want the blame when our application underperforms due to improper installation of dependencies over which we have no control. Bad performance is your fault, hapless user!
  • Probably other essential low-level stuff. Profiling, maybe?

We can all agree that freezing, optimization, and minimizing end user complaints are useful. Ergo, identifying C extensions is useful.

The Disagreement Deepens

I also disagree with abarnert's penultimate conclusion that:

The best heuristics anyone has come up with for this are the ones implemented in the inspect module, so the best thing to do is to use that.

No. The best heuristics anyone has come up with for this are those given below. All stdlib modules (including but not limited to inspect) are useless for this purpose. Specifically:

  • The inspect.getsource() and inspect.getsourcefile() functions ambiguously return None for both C extensions (which understandably have no pure-Python source) and other types of modules that also have no pure-Python source (e.g., bytecode-only modules). Useless.
  • importlib machinery only applies to modules loadable by PEP 302-compliant loaders and hence visible to the default importlib import algorithm. Useful, but hardly generally applicable. The assumption of PEP 302 compliance breaks down when the real world hits your package in the face repeatedly. For example, did you know that the __import__() built-in is actually overriddable? This is how we used to customize Python's import mechanism – back when the Earth was still flat.

abarnert's ultimate conclusion is also contentious:

…there is no perfect answer.

There is a perfect answer. Much like the oft-doubted Triforce of Hyrulean legend, a perfect answer exists for every imperfect question.

Let's find it.

In Search of Perfection

The pure-Python function that follows returns True only if the passed previously imported module object is a C extension: For simplicity, Python 3.x is assumed.

import inspect, os
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType

def is_c_extension(module: ModuleType) -> bool:
    '''
    `True` only if the passed module is a C extension implemented as a
    dynamically linked shared library specific to the current platform.

    Parameters
    ----------
    module : ModuleType
        Previously imported module object to be tested.

    Returns
    ----------
    bool
        `True` only if this module is a C extension.
    '''
    assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)

    # If this module was loaded by a PEP 302-compliant CPython-specific loader
    # loading only C extensions, this module is a C extension.
    if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
        return True

    # Else, fallback to filetype matching heuristics.
    #
    # Absolute path of the file defining this module.
    module_filename = inspect.getfile(module)

    # "."-prefixed filetype of this path if any or the empty string otherwise.
    module_filetype = os.path.splitext(module_filename)[1]

    # This module is only a C extension if this path's filetype is that of a
    # C extension specific to the current platform.
    return module_filetype in EXTENSION_SUFFIXES

If it looks long, that's because docstrings, comments, and assertions are good. It's actually only six lines. Eat your elderly heart out, Guido.

Proof in the Pudding

Let's unit test this function with four portably importable modules:

  • The stdlib pure-Python os.__init__ module. Hopefully not a C extension.
  • The stdlib pure-Python importlib.machinery submodule. Hopefully not a C extension.
  • The stdlib _elementtree C extension.
  • The third-party numpy.core.multiarray C extension.

To wit:

>>> import os
>>> import importlib.machinery as im
>>> import _elementtree as et
>>> import numpy.core.multiarray as ma
>>> for module in (os, im, et, ma):
...     print('Is "{}" a C extension? {}'.format(
...         module.__name__, is_c_extension(module)))
Is "os" a C extension? False
Is "importlib.machinery" a C extension? False
Is "_elementtree" a C extension? True
Is "numpy.core.multiarray" a C extension? True

All's well that ends.

How to do this?

The details of our code are quite inconsequential. Very well, where do we begin?

  1. If the passed module was loaded by a PEP 302-compliant loader (the common case), the PEP 302 specification requires the attribute assigned on importation to this module to define a special __loader__ attribute whose value is the loader object loading this module. Hence:
    1. If this value for this module is an instance of the CPython-specific importlib.machinery.ExtensionFileLoader class, this module is a C extension.
  2. Else, either (A) the active Python interpreter is not the official CPython implementation (e.g., PyPy) or (B) the active Python interpreter is CPython but this module was not loaded by a PEP 302-compliant loader, typically due to the default __import__() machinery being overridden (e.g., by a low-level bootloader running this Python application as a platform-specific frozen binary). In either case, fallback to testing whether this module's filetype is that of a C extension specific to the current platform.

Eight line functions with twenty page explanations. Thas just how we rolls.

like image 89
Cecil Curry Avatar answered Sep 22 '22 05:09

Cecil Curry