Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytest mocker patch - how to troubleshoot?

I am having what I believe to be a common issue in using mock patching in that I can not figure out the right thing to patch.

I have two questions that I am hoping for help with.

  1. Thoughts on how to fix the specific issue in the below example
  2. And possibly most-importantly pro-tips/pointers/thoughts/suggestions on how to best troubleshoot the "which thing do I patch" question. The problem I'm having is, without a full understanding of how patching works, I really dont even know what I should be looking for and find myself playing a guessing game.

An example using pyarrow that is currently causing me pain:

mymodule.py

import pyarrow

class HdfsSearch:
    def __init__(self):
        self.fs = self._connect()

    def _connect(self) -> object:
        return pyarrow.hdfs.connect(driver="libhdfs")

    def search(self, path: str):
        return self.fs.ls(path=path)

test_module.py

import pyarrow
import pytest

from mymodule import HdfsSearch

@pytest.fixture()
def hdfs_connection_fixture(mocker):
    mocker.patch("pyarrow.hdfs.connect")
    yield HdfsSearch()

def test_hdfs_connection(hdfs_connection_fixture):
    pyarrow.hdfs.connect.assert_called_once() # <-- succeeds

def test_hdfs_search(hdfs_connection_fixture):
    hdfs_connection_fixture.search(".")
    pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once() # <-- fails

pytest output:

$ python -m pytest --verbose test_module.py
=========================================================================================================== test session starts ============================================================================================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /home/bbaur/miniconda3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /home/user1/work/app
plugins: cov-2.7.1, mock-1.10.4
collected 2 items

test_module.py::test_hdfs_connection PASSED                                                                                                                                                                                          [ 50%]
test_module.py::test_hdfs_search FAILED                                                                                                                                                                                              [100%]

================================================================================================================= FAILURES =================================================================================================================
_____________________________________________________________________________________________________________ test_hdfs_search _____________________________________________________________________________________________________________

hdfs_connection_fixture = <mymodule.HdfsSearch object at 0x7fdb4ec2a610>

    def test_hdfs_search(hdfs_connection_fixture):
        hdfs_connection_fixture.search(".")
>       pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once()
E       AttributeError: 'function' object has no attribute 'assert_called_once'

test_module.py:16: AttributeError
like image 625
user9074332 Avatar asked Sep 13 '19 18:09

user9074332


2 Answers

You're not calling the assert on the Mock object, this is the correct assert:

hdfs_connection_fixture.fs.ls.assert_called_once()

Explanation:

When you access any attribute in a Mock object it will return another Mock object.

Since you patched "pyarrow.hdfs.connect" you've replaced it with a Mock, let's call it Mock A. Your _connect method will return that Mock A and you'll assign it to self.fs.

Now let's break down what's happening in the search method when you call self.fs.ls.

self.fs returns your Mock A object, then the .ls will return a different Mock object, let's call it Mock B. In this Mock B object you're doing the call passing (path=path).

In your assert you're trying to access pyarrow.hdfs.HadoopFileSystem, but it was never patched. You'll need do the assert on the Mock B object, which is at hdfs_connection_fixture.fs.ls

What to Patch

If you change your import in mymodule.py to this from pyarrow.hdfs import connect your patch will stop working.

Why is that?

When you patch something you're changing what a name points to, not the actual object.

Your current patch is patching the name pyarrow.hdfs.connect and in mymodule you're using the same name pyarrow.hdfs.connect so everything is fine.

However, if you use from pyarrow.hdfs import connect mymodule will have imported the real pyarrow.hdfs.connect and created a reference for it with the name mymodule.connect.

So when you call connect inside mymodule you're accessing the name mymodule.connect, which is not patched.

That is why you would need to patch mymodule.connect when using from import.

I'd recommend using from x import y when doing this kind of patching. It makes it more explicit what you're trying to mock and the patch will be limited to that module only, which can prevent unforeseen side-effects.

Source, this section in the Python documentation: Where to patch

like image 197
Gabriel Cappelli Avatar answered Sep 23 '22 11:09

Gabriel Cappelli


To understand how patching works in python let's first understand the import statement.

When we use import pyarrow in a module (mymodule.py in this case) it does two operations :

  1. It searches for the pyarrow module in sys.modules
  2. It binds the results of that search to a name(pyarrow) in the local scope. By doing something like: pyarrow = sys.modules['pyarrow']

NOTE: import statements in python doesn't execute code. The import statement brings a name into local scope. The execution of code happens as a side-effect only when python can't find a module in sys.modules

So, to patch pyarrow imported in mymodule.py we need to patch the pyarrow name present in the local scope of mymodule.py

patch('mymodule.pyarrow', autospec=True)

test_module.py

import pytest
from mock import Mock, sentinel
from pyarrow import hdfs

from mymodule import HdfsSearch


class TestHdfsSearch(object):
    @pytest.fixture(autouse=True, scope='function')
    def setup(self, mocker):
        self.hdfs_mock = Mock(name='HadoopFileSystem', spec=hdfs.HadoopFileSystem)
        self.connect_mock = mocker.patch("mymodule.pyarrow.hdfs.connect", return_value=self.hdfs_mock)

    def test_initialize_HdfsSearch_should_connect_pyarrow_hdfs_file_system(self):
        HdfsSearch()

        self.connect_mock.assert_called_once_with(driver="libhdfs")

    def test_initialize_HdfsSearch_should_set_pyarrow_hdfs_as_file_system(self):
        hdfs_search = HdfsSearch()

        assert self.hdfs_mock == hdfs_search.fs

    def test_search_should_retrieve_directory_contents(self):
        hdfs_search = HdfsSearch()
        self.hdfs_mock.ls.return_value = sentinel.contents

        result = hdfs_search.search(".")

        self.hdfs_mock.ls.assert_called_once_with(path=".")
        assert sentinel.contents == result

Use context managers to patch built-ins

def test_patch_built_ins():
    with patch('os.curdir') as curdir_mock:  # curdir_mock lives only inside with block. Doesn't lives outside
        assert curdir_mock == os.curdir
    assert os.curdir == '.'
like image 36
Ravi Sharma Avatar answered Sep 22 '22 11:09

Ravi Sharma