Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.6 project structure leads to RuntimeWarning

I'm trying to package up my project for distribution, but I'm hitting a RuntimeWarning when I run the module.

I've found a bug report on the Python mailing list which indicates that the RuntimeWarning is new behaviour that was introduced in Python 3.5.2.

Reading through the bug report, it appears that there is a double-import which happens, and this RuntimeWarning is correct in alerting the user. However, I don't see what changes that I need to make to my own project structure to avoid this issue.

This is the first project that I have attempted to structure "correctly". I would like to have a tidy layout for when I push the code, and a project structure which can be cloned and run easily by others.

I have based my structure mainly on http://docs.python-guide.org/en/latest/writing/structure/.

I have added details of a minimum working example below.

To replicate the issue, I run the main file with python -m:

(py36) X:\test_proj>python -m proj.proj
C:\Users\Matthew\Anaconda\envs\py36\lib\runpy.py:125: RuntimeWarning: 
'proj.proj' found in sys.modules after import of package 'proj', but prior 
to execution of 'proj.proj'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
This is a test project.`

Running my tests are fine:

(py36) X:\test_proj>python -m unittest tests.test_proj
This is a test project.
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

A project structure to replicate the issue is as follows:

myproject/
    proj/
        __init__.py
        proj.py
    tests/
        __init__.py
        context.py
        test_proj.py

In the file proj/proj.py:

def main():
    print('This is a test project.')
    raise ValueError

if __name__ == '__main__':
    main()

In proj/__init__.py:

from .proj import main

In tests/context.py:

import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
import proj

Finally, in tests/test_proj.py:

import unittest

from .context import proj


class SampleTestCase(unittest.TestCase):
    """Test case for this sample project"""
    def test_raise_error(self):
        """Test that we correctly raise an error."""
        with self.assertRaises(ValueError):
            proj.main()


if __name__ == '__main__':
    unittest.main()

Can anyone help me correct my project structure to avoid this double-import scenario? Any help with this would be greatly appreciated.

like image 537
Matthew Avatar asked Apr 13 '17 13:04

Matthew


5 Answers

For this particular case, the double import warning is due to this line in proj/__init__.py:

from .proj import main

What that line means is that by the time the -m switch implementation finishes the import proj step, proj.proj has already been imported as a side effect of importing the parent package.

Avoiding the warning

To avoid the warning, you need to find a way to ensure that importing the parent package doesn't implicitly import the package being executed with the -m switch.

The two main options for resolving that are:

  1. Drop the from .proj import main line (as @John Moutafis suggested), assuming that can be done without breaking API compatibility guarantees; or
  2. Delete the if __name__ == "__main__": block from the proj submodule and replace it with a separate proj/__main__.py file that just does:

    from .proj import main
    main()
    

If you go with option 2, then the command line invocation would also change to just be python -m proj, rather than referencing a submodule.

A more backwards compatible variant of option 2 is to add __main__.py without deleting the CLI block from the current submodule, and that can be an especially good approach when combined with DeprecationWarning:

if __name__ == "__main__":
    import warnings
    warnings.warn("use 'python -m proj', not 'python -m proj.proj'", DeprecationWarning)
    main()

If proj/__main__.py is already being used for some other purpose, then you can also do things like replacing python -m proj.proj with python -m proj.proj_cli, where proj/proj_cli.py looks like:

if __name__ != "__main__":
    raise RuntimeError("Only for use with the -m switch, not as a Python API")
from .proj import main
main()

Why does the warning exist?

This warning gets emitted when the -m switch implementation is about to go and run an already imported module's code again in the __main__ module, which means you will have two distinct copies of everything it defines - classes, functions, containers, etc.

Depending on the specifics of the application, this may work fine (which is why it's a warning rather than an error), or it may lead to bizarre behaviour like module level state modifications not being shared as expected, or even exceptions not being caught because the exception handler was trying to catch the exception type from one instance of the module, while the exception raised used the type from the other instance.

Hence the vague this may cause unpredictable behaviour warning - if things do go wrong as a result of running the module's top level code twice, the symptoms may be pretty much anything.

How can you debug more complex cases?

While in this particular example, the side-effect import is directly in proj/__init__.py, there's a far more subtle and hard to debug variant where the parent package instead does:

import some_other_module

and then it is some_other_module (or a module that it imports) that does:

import proj.proj # or "from proj import proj"

Assuming the misbehaviour is reproducible, the main way to debug these kinds of problems is to run python in verbose mode and check the import sequence:

$ python -v -c "print('Hello')" 2>&1 | grep '^import'
import zipimport # builtin
import site # precompiled from /usr/lib64/python2.7/site.pyc
import os # precompiled from /usr/lib64/python2.7/os.pyc
import errno # builtin
import posix # builtin
import posixpath # precompiled from /usr/lib64/python2.7/posixpath.pyc
import stat # precompiled from /usr/lib64/python2.7/stat.pyc
import genericpath # precompiled from /usr/lib64/python2.7/genericpath.pyc
import warnings # precompiled from /usr/lib64/python2.7/warnings.pyc
import linecache # precompiled from /usr/lib64/python2.7/linecache.pyc
import types # precompiled from /usr/lib64/python2.7/types.pyc
import UserDict # precompiled from /usr/lib64/python2.7/UserDict.pyc
import _abcoll # precompiled from /usr/lib64/python2.7/_abcoll.pyc
import abc # precompiled from /usr/lib64/python2.7/abc.pyc
import _weakrefset # precompiled from /usr/lib64/python2.7/_weakrefset.pyc
import _weakref # builtin
import copy_reg # precompiled from /usr/lib64/python2.7/copy_reg.pyc
import traceback # precompiled from /usr/lib64/python2.7/traceback.pyc
import sysconfig # precompiled from /usr/lib64/python2.7/sysconfig.pyc
import re # precompiled from /usr/lib64/python2.7/re.pyc
import sre_compile # precompiled from /usr/lib64/python2.7/sre_compile.pyc
import _sre # builtin
import sre_parse # precompiled from /usr/lib64/python2.7/sre_parse.pyc
import sre_constants # precompiled from /usr/lib64/python2.7/sre_constants.pyc
import _locale # dynamically loaded from /usr/lib64/python2.7/lib-dynload/_localemodule.so
import _sysconfigdata # precompiled from /usr/lib64/python2.7/_sysconfigdata.pyc
import abrt_exception_handler # precompiled from /usr/lib64/python2.7/site-packages/abrt_exception_handler.pyc
import encodings # directory /usr/lib64/python2.7/encodings
import encodings # precompiled from /usr/lib64/python2.7/encodings/__init__.pyc
import codecs # precompiled from /usr/lib64/python2.7/codecs.pyc
import _codecs # builtin
import encodings.aliases # precompiled from /usr/lib64/python2.7/encodings/aliases.pyc
import encodings.utf_8 # precompiled from /usr/lib64/python2.7/encodings/utf_8.pyc

This particular example just shows the base set of imports that Python 2.7 on Fedora does at startup. When debugging a double-import RuntimeWarning like the one in this question, you'd be searching for the "import proj" and then "import proj.proj" lines in the verbose output, and then looking closely at the imports immediately preceding the "import proj.proj" line.

like image 145
ncoghlan Avatar answered Nov 14 '22 08:11

ncoghlan


If you take a look at the double import trap you will see this:

This next trap exists in all current versions of Python, including 3.3, and can be summed up in the following general guideline: “Never add a package directory, or any directory inside a package, directly to the Python path”.

The reason this is problematic is that every module in that directory is now potentially accessible under two different names: as a top level module (since the directory is on sys.path) and as a submodule of the package (if the higher level directory containing the package itself is also on sys.path).

In tests/context.py

remove: sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

which probably causes the problem and your code still works as expected.


Edit due to comment:

You can try and change some parts in your code:

  1. proj/__init__.py Can be completely empty
  2. On test_proj.py should change the imports as follows:

    import unittest
    
    from proj import proj
    

PS: I wasn't able to reproduce the warning on Linux with your initial code or with my suggestions either.

like image 43
John Moutafis Avatar answered Nov 14 '22 07:11

John Moutafis


@ncoghlan answer is right. I just want to add to his solution 1 that you only need to remove the import in __init__.py if you execute your package with the -m switch. That boils down to figuring out in __init__.py whether python was called with the -m switch. sys.flags unfortunately does not contain an entry for the -m switch, but sys.argv seems to contain a single element containing "-m" (I did, however, not figure out whether this behaviour is doumented). So change __init__.py in the following way:

import sys
if not '-m' in sys.argv:
    from .proj import main

If you execute the package with the -m switch .proj will not be imported by __init__.py and you avoid the double import. If you import the package from another script .proj is imported as intended. Unfortunately, sys.argv does not contain the argument to the -m switch! So maybe moving the main() function to a separate file is the better solution. But I really like to have a main() function in my modules for quick and simple testing/demonstrations.

like image 4
janscience Avatar answered Nov 14 '22 06:11

janscience


If you are certain the warning is not relevant for you, an easy way to avoid it is ignoring RuntimeWarnings by the runpy module that implements the logic behind the -m switch:

import sys
import warnings

if not sys.warnoptions:  # allow overriding with `-W` option
    warnings.filterwarnings('ignore', category=RuntimeWarning, module='runpy')

This obviously may hide relevant warnings as well, but at least at the moment this is the only RuntimeWarning that runpy uses. Alternatively filtering could be made more strict by specifying pattern for the message or line number where warning must occur, but both of these may be broken if runpy is edited later.

like image 3
Pekka Klärck Avatar answered Nov 14 '22 08:11

Pekka Klärck


python -m is a bit tricky. @ncoghlan have already provided detailed information. when we try to run with python -m by default all packages within sys.path/pythonpath are imported. if your package have import statement to anything within the directories in the PATHs the above warning occurs.See the Pic

My PYTHONPATH already have the Project directory. Thus when I do

from reader.reader import Reader

System throws the warning. Thus no need to have explicit imports if the path is in python path

like image 2
Doogle Avatar answered Nov 14 '22 07:11

Doogle