In the following sequence using numpy dir(np) returns duplicate entries. Is this a bug? Is dir() allowed/expected to return duplicates?
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> len(dir(np))
620
>>> np.testing
<module 'numpy.testing' from 'C:\\Python\\Python38\\lib\\site-packages\\numpy\\testing\\__init__.py'>
>>> len(dir(np))
621
>>> [i for i in dir(np) if i == "testing"]
['testing', 'testing']
>>> np.__version__
'1.18.1'
>>>
This looks to be a (relatively harmless) bug triggered by an optimization numpy
put in to avoid the expense of importing the testing
subpackage eagerly, while still providing testing
and Tester
as attributes of the root numpy
package.
The optimization uses a module-level __getattr__
(only available on Python 3.7+, so it's not used on 3.6 and earlier) to import them only if they're explicitly accessed (at which point testing
becomes a real attribute of numpy
, as child modules and packages are attached to their parent as attributes on import automatically), but to continue pretending they're imported eagerly, it also defines a module-level __dir__
that pretends they already exist:
def __dir__():
return list(globals().keys()) + ['Tester', 'testing']
The flaw here is that, if numpy.testing
is imported (either explicitly, or implicitly through the __getattr__
hook), then it already appears in globals()
, so adding ['Tester', 'testing']
to the list
adds a second copy of 'testing'
to the result of dir
.
They could trivially fix this by deduping (before converting to list
, or just omitting the conversion, since dir
is already documented to perform the conversion automatically), rather than concatenating after, e.g.:
def __dir__():
return globals().keys() | {'Tester', 'testing'}
but it's not a serious bug; code that breaks because dir
produces a doubled result is likely pretty brittle and buggy from the get-go.
The full explanation for this optimization is in the source comments:
# Importing Tester requires importing all of UnitTest which is not a
# cheap import Since it is mainly used in test suits, we lazy import it
# here to save on the order of 10 ms of import time for most users
#
# The previous way Tester was imported also had a side effect of adding
# the full `numpy.testing` namespace
#
# module level getattr is only supported in 3.7 onwards
# https://www.python.org/dev/peps/pep-0562/
On 3.6 and earlier, the code path defining __getattr__
and __dir__
is skipped, and all it does is:
# We don't actually use this ourselves anymore, but I'm not 100% sure that
# no-one else in the world is using it (though I hope not)
from .testing import Tester
which means testing
and Tester
are "real" attributes from the get-go, and the bug doesn't arise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With