I saw some commits in a Python code base removing "hourglass imports." I've never seen this term before and I can't find anything about it via the Python documentation or web search.
What are hourglass imports and when would one use or not use them? My best guess is that removing them makes submodules easier to find, but are there other reasons?
An example change removing hourglass imports from one of the linked commits:
diff --git a/tensorflow/contrib/slim/python/slim/nets/vgg.py b/tensorflow/contrib/slim/python/slim/nets/vgg.py
index 3c29767f2..d4eb43cbb 100644
--- a/tensorflow/contrib/slim/python/slim/nets/vgg.py
+++ b/tensorflow/contrib/slim/python/slim/nets/vgg.py
@@ -37,13 +37,20 @@ Usage:
@@vgg_16
@@vgg_19
"""
+
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
-import tensorflow as tf
-
-slim = tf.contrib.slim
+from tensorflow.contrib import layers
+from tensorflow.contrib.framework.python.ops import arg_scope
+from tensorflow.contrib.layers.python.layers import layers as layers_lib
+from tensorflow.contrib.layers.python.layers import regularizers
+from tensorflow.contrib.layers.python.layers import utils
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import init_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import variable_scope
def vgg_arg_scope(weight_decay=0.0005):
The top level tensorflow __init__.py
exports the symbols from the submodules.
# tensorflow/python/__init__.py
...
from tensorflow.python.ops.standard_ops import *
...
# tensorflow/python/ops/standard_ops.py
...
from tensorflow.python.ops.array_ops import *
from tensorflow.python.ops.check_ops import *
from tensorflow.python.ops.clip_ops import *
...
Python code in one module gains access to the code in another module by the process of importing it. The import statement is the most common way of invoking the import machinery, but it is not the only way. Functions such as importlib.
This happens because when Python imports a module, it runs all the code in that module. After running the module it takes whatever variables were defined in that module, and it puts them on the module object, which in our case is salutations . So within our salutations module, we have a greet function: >>> salutations.
Use: if "sys" not in dir(): print("sys not imported!")
So __all__ specifies all modules that shall be loaded and imported into the current namespace when we use from <package> import * .
TensorFlow contributor here :wave:. We use the term hourglass import to refer to modules that import a bunch of things from other modules and re-export them. You’ve provided a good example in your question.
The reason that we care about this and the reason that we call it an hourglass both have to do with the shape of the build graph. The whole point of the hourglass module is that lots of users will depend on it as a convenient entry point. And it itself depends on lots of internal symbols. So your dependency graph has a lot of edges going through this one node, funnelled as through the center of an hourglass:
In a real-world context, the hourglass will be both wider and deeper
than this, on both sides. End-users may define libraries that depend on
:standard_ops
and binaries that depend on those libraries, and the
internal ops may themselves have layers of dependency.
The problem with this is that it makes it hard to cheaply and correctly
re-build in response to changes. If we change part of :check_ops
, then
it looks like :standard_ops
needs to be re-built, because one of its
dependencies has changed. And because :standard_ops
has been re-built,
so too must its dependencies be. But now we’ve re-built all the end-user
programs, even if they didn’t even actually use the functionality
provided by :check_ops
at all. We say that the build graph
overapproximates the actual dependency graph. Overapproximation is
sound—the builds will still be correct—but it can be wasteful.
This is a problem on large codebases like TensorFlow, where we have many thousands of tests, we run all affected tests when you change any code, and the tests can be expensive. If your estimate of “which tests are affected by this change?” is a vast overapproximation due to an hourglass dependency, you’re wasting a lot of compute power on tests, and your developers also have to wait longer to merge their changes.
The patch in your original question shows how we might remove an hourglass dependency and rewrite the clients to point directly to those parts of the build graph that they actually use:
This way, if :check_ops
is changed, we can see that we only need to
re-build and re-test one client.
There are benefits and drawbacks to this. For real end users, having to
directly import lots of internals is annoying. That’s not a nice API,
not nearly as nice as import numpy as np
or import tensorflow as tf
.
Furthermore, it exposes implementation details, making it harder for us
to move around those modules. So, for these reasons, we do still
provide an hourglass import to users, both publicly and within Google.
However, we try not to use hourglass imports within our own
codebase. Breaking changes aren’t an issue within our own repository,
since if we want to rename something we can just rename all its clients
at the same time. And we have tools for working with our build
graphs and are comfortable doing so, which is something that
most Python programmers don’t want to have to worry about. The tools are
pretty nice, though—in addition to generating nice visual graphs (as
above) for your real codebase, they underlie a powerful query engine,
where you can ask the system questions like “what targets that
transitively depend on :foo
are still running on Python 2 and belong
to my team?”. This is more powerful when your build graph is more
precise.
TL;DR: An hourglass module is one that bundles up imports from many submodules and exposes them to many client modules. We avoid them because it overapproximates the build graph, which makes it more expensive to run tests and harder to analyze the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With