Python

Question

In the following sequence using numpy dir(np) returns duplicate entries. Is this a bug? Is dir() allowed/expected to return duplicates?

Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> len(dir(np))
620
>>> np.testing
<module 'numpy.testing' from 'C:\Python\Python38\lib\site-packages\numpy\testing\__init__.py'>
>>> len(dir(np))
621
>>> [i for i in dir(np) if i == "testing"]
['testing', 'testing']
>>> np.__version__
'1.18.1'
>>>

ShadowRanger · Accepted Answer

This looks to be a (relatively harmless) bug triggered by an optimization numpy put in to avoid the expense of importing the testing subpackage eagerly, while still providing testing and Tester as attributes of the root numpy package.

The optimization uses a module-level __getattr__ (only available on Python 3.7+, so it's not used on 3.6 and earlier) to import them only if they're explicitly accessed (at which point testing becomes a real attribute of numpy, as child modules and packages are attached to their parent as attributes on import automatically), but to continue pretending they're imported eagerly, it also defines a module-level __dir__ that pretends they already exist:

def __dir__():
    return list(globals().keys()) + ['Tester', 'testing']

The flaw here is that, if numpy.testing is imported (either explicitly, or implicitly through the __getattr__ hook), then it already appears in globals(), so adding ['Tester', 'testing'] to the list adds a second copy of 'testing' to the result of dir.

They could trivially fix this by deduping (before converting to list, or just omitting the conversion, since dir is already documented to perform the conversion automatically), rather than concatenating after, e.g.:

def __dir__():
    return globals().keys() | {'Tester', 'testing'}

but it's not a serious bug; code that breaks because dir produces a doubled result is likely pretty brittle and buggy from the get-go.

The full explanation for this optimization is in the source comments:

    # Importing Tester requires importing all of UnitTest which is not a
    # cheap import Since it is mainly used in test suits, we lazy import it
    # here to save on the order of 10 ms of import time for most users
    #
    # The previous way Tester was imported also had a side effect of adding
    # the full `numpy.testing` namespace
    #
    # module level getattr is only supported in 3.7 onwards
    # https://www.python.org/dev/peps/pep-0562/

On 3.6 and earlier, the code path defining __getattr__ and __dir__ is skipped, and all it does is:

    # We don't actually use this ourselves anymore, but I'm not 100% sure that
    # no-one else in the world is using it (though I hope not)
    from .testing import Tester

which means testing and Tester are "real" attributes from the get-go, and the bug doesn't arise.

Python - Duplicates in dir() Is it a bug?

Tags:

numpy

PyScripter

1 Answers

ShadowRanger

Recent Activity

Donate For Us