What is the correct or most robust way to tell from Python if an imported module comes from a C extension as opposed to a pure Python module? This is useful, for example, if a Python package has a module with both a pure Python implementation and a C implementation, and you want to be able to tell at runtime which one is being used.
One idea is to examine the file extension of module.__file__
, but I'm not sure all the file extensions one should check for and if this approach is necessarily the most reliable.
For a pure Python module, we can locate its source by module_name. __file__. This will return the location where the module's . py file exists.
Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in .
In Python, Modules are simply files with the “. py” extension containing Python code that can be imported inside another Python Program. In simple terms, we can consider a module to be the same as a code library or a file that contains a set of functions that you want to include in your application.
tl;dr
See the "In Search of Perfection" subsection below for the well-tested answer.
As a pragmatic counterpoint to abarnert's helpful analysis of the subtlety involved in portably identifying C extensions, Stack Overflow Productions™ presents... an actual answer.
The capacity to reliably differentiate C extensions from non-C extensions is incredibly useful, without which the Python community would be impoverished. Real-world use cases include:
We can all agree that freezing, optimization, and minimizing end user complaints are useful. Ergo, identifying C extensions is useful.
I also disagree with abarnert's penultimate conclusion that:
The best heuristics anyone has come up with for this are the ones implemented in the
inspect
module, so the best thing to do is to use that.
No. The best heuristics anyone has come up with for this are those given below. All stdlib modules (including but not limited to inspect
) are useless for this purpose. Specifically:
inspect.getsource()
and inspect.getsourcefile()
functions ambiguously return None
for both C extensions (which understandably have no pure-Python source) and other types of modules that also have no pure-Python source (e.g., bytecode-only modules). Useless.importlib
machinery only applies to modules loadable by PEP 302-compliant loaders and hence visible to the default importlib
import algorithm. Useful, but hardly generally applicable. The assumption of PEP 302 compliance breaks down when the real world hits your package in the face repeatedly. For example, did you know that the __import__()
built-in is actually overriddable? This is how we used to customize Python's import mechanism – back when the Earth was still flat.
abarnert's ultimate conclusion is also contentious:
…there is no perfect answer.
There is a perfect answer. Much like the oft-doubted Triforce of Hyrulean legend, a perfect answer exists for every imperfect question.
Let's find it.
The pure-Python function that follows returns True
only if the passed previously imported module object is a C extension: For simplicity, Python 3.x is assumed.
import inspect, os
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
If it looks long, that's because docstrings, comments, and assertions are good. It's actually only six lines. Eat your elderly heart out, Guido.
Let's unit test this function with four portably importable modules:
os.__init__
module. Hopefully not a C extension.
importlib.machinery
submodule. Hopefully not a C extension.
_elementtree
C extension.numpy.core.multiarray
C extension.To wit:
>>> import os
>>> import importlib.machinery as im
>>> import _elementtree as et
>>> import numpy.core.multiarray as ma
>>> for module in (os, im, et, ma):
... print('Is "{}" a C extension? {}'.format(
... module.__name__, is_c_extension(module)))
Is "os" a C extension? False
Is "importlib.machinery" a C extension? False
Is "_elementtree" a C extension? True
Is "numpy.core.multiarray" a C extension? True
All's well that ends.
The details of our code are quite inconsequential. Very well, where do we begin?
__loader__
attribute whose value is the loader object loading this module. Hence:
importlib.machinery.ExtensionFileLoader
class, this module is a C extension.__import__()
machinery being overridden (e.g., by a low-level bootloader running this Python application as a platform-specific frozen binary). In either case, fallback to testing whether this module's filetype is that of a C extension specific to the current platform.Eight line functions with twenty page explanations. Thas just how we rolls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With