What are hourglass imports and why would they be avoided in a codebase?

Tags:

1 Answers

TensorFlow contributor here :wave:. We use the term hourglass import to refer to modules that import a bunch of things from other modules and re-export them. You’ve provided a good example in your question.

The reason that we care about this and the reason that we call it an hourglass both have to do with the shape of the build graph. The whole point of the hourglass module is that lots of users will depend on it as a convenient entry point. And it itself depends on lots of internal symbols. So your dependency graph has a lot of edges going through this one node, funnelled as through the center of an hourglass:

In a real-world context, the hourglass will be both wider and deeper than this, on both sides. End-users may define libraries that depend on :standard_ops and binaries that depend on those libraries, and the internal ops may themselves have layers of dependency.

The problem with this is that it makes it hard to cheaply and correctly re-build in response to changes. If we change part of :check_ops, then it looks like :standard_ops needs to be re-built, because one of its dependencies has changed. And because :standard_ops has been re-built, so too must its dependencies be. But now we’ve re-built all the end-user programs, even if they didn’t even actually use the functionality provided by :check_ops at all. We say that the build graph overapproximates the actual dependency graph. Overapproximation is sound—the builds will still be correct—but it can be wasteful.

This is a problem on large codebases like TensorFlow, where we have many thousands of tests, we run all affected tests when you change any code, and the tests can be expensive. If your estimate of “which tests are affected by this change?” is a vast overapproximation due to an hourglass dependency, you’re wasting a lot of compute power on tests, and your developers also have to wait longer to merge their changes.

The patch in your original question shows how we might remove an hourglass dependency and rewrite the clients to point directly to those parts of the build graph that they actually use:

This way, if :check_ops is changed, we can see that we only need to re-build and re-test one client.

There are benefits and drawbacks to this. For real end users, having to directly import lots of internals is annoying. That’s not a nice API, not nearly as nice as import numpy as np or import tensorflow as tf. Furthermore, it exposes implementation details, making it harder for us to move around those modules. So, for these reasons, we do still provide an hourglass import to users, both publicly and within Google. However, we try not to use hourglass imports within our own codebase. Breaking changes aren’t an issue within our own repository, since if we want to rename something we can just rename all its clients at the same time. And we have tools for working with our build graphs and are comfortable doing so, which is something that most Python programmers don’t want to have to worry about. The tools are pretty nice, though—in addition to generating nice visual graphs (as above) for your real codebase, they underlie a powerful query engine, where you can ask the system questions like “what targets that transitively depend on :foo are still running on Python 2 and belong to my team?”. This is more powerful when your build graph is more precise.

TL;DR: An hourglass module is one that bundles up imports from many submodules and exposes them to many client modules. We avoid them because it overapproximates the build graph, which makes it more expensive to run tests and harder to analyze the code.

127

answered Oct 16 '22 17:10

wchargin

Related questions
                            
                                Python - Calculate ongoing 1 Standard Deviation from linear regression line
                            
                                Connexion class based handling
                            
                                How to keep only one legend in seaborn subplots
                            
                                Import error while using dev_appserver.py in virtualenv
                            
                                In Python, how to parse a string representing a set of keyword arguments such that the order does not matter
                            
                                Filtering Pandas DataFrame on last n dates
                            
                                Event Handling in Python Luigi
                            
                                How to run tests in django using database with data?
                            
                                Why does this shape in Tkinter update slowly?
                            
                                How to change the attributes of cv2.StereoBM_create for depth map in OpenCV Python
                            
                                How to efficiently find small typos in source code files?
                            
                                Filter a pandas data frame by requiring presence of multiple items in a MultiIndex level
                            
                                How to use max pooling to gather information from LSTM nodes
                            
                                How do I display the console output to HTML in Django?
                            
                                how to enforce Monotonic Constraints in XGBoost with ScikitLearn?
                            
                                Django: revert merge migration
                            
                                XML ElementTree - indexing tags
                            
                                How can I clip the values returned by a layer in Keras?
                            
                                Pandas: how to get status of lines read when using read_csv?
                            
                                type() function doesn't return correct result for a boto3 sqs object?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are hourglass imports and why would they be avoided in a codebase?

Tags:

python

coding-style

cyang

People also ask

1 Answers

wchargin

Recent Activity

Donate For Us