I am having what I believe to be a common issue in using mock patching in that I can not figure out the right thing to patch.
I have two questions that I am hoping for help with.
An example using pyarrow
that is currently causing me pain:
import pyarrow
class HdfsSearch:
def __init__(self):
self.fs = self._connect()
def _connect(self) -> object:
return pyarrow.hdfs.connect(driver="libhdfs")
def search(self, path: str):
return self.fs.ls(path=path)
import pyarrow
import pytest
from mymodule import HdfsSearch
@pytest.fixture()
def hdfs_connection_fixture(mocker):
mocker.patch("pyarrow.hdfs.connect")
yield HdfsSearch()
def test_hdfs_connection(hdfs_connection_fixture):
pyarrow.hdfs.connect.assert_called_once() # <-- succeeds
def test_hdfs_search(hdfs_connection_fixture):
hdfs_connection_fixture.search(".")
pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once() # <-- fails
$ python -m pytest --verbose test_module.py
=========================================================================================================== test session starts ============================================================================================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /home/bbaur/miniconda3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /home/user1/work/app
plugins: cov-2.7.1, mock-1.10.4
collected 2 items
test_module.py::test_hdfs_connection PASSED [ 50%]
test_module.py::test_hdfs_search FAILED [100%]
================================================================================================================= FAILURES =================================================================================================================
_____________________________________________________________________________________________________________ test_hdfs_search _____________________________________________________________________________________________________________
hdfs_connection_fixture = <mymodule.HdfsSearch object at 0x7fdb4ec2a610>
def test_hdfs_search(hdfs_connection_fixture):
hdfs_connection_fixture.search(".")
> pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once()
E AttributeError: 'function' object has no attribute 'assert_called_once'
test_module.py:16: AttributeError
You're not calling the assert on the Mock object, this is the correct assert:
hdfs_connection_fixture.fs.ls.assert_called_once()
Explanation:
When you access any attribute in a Mock object it will return another Mock object.
Since you patched "pyarrow.hdfs.connect"
you've replaced it with a Mock, let's call it Mock A. Your _connect
method will return that Mock A and you'll assign it to self.fs
.
Now let's break down what's happening in the search
method when you call self.fs.ls
.
self.fs
returns your Mock A object, then the .ls
will return a different Mock object, let's call it Mock B. In this Mock B object you're doing the call passing (path=path)
.
In your assert you're trying to access pyarrow.hdfs.HadoopFileSystem
, but it was never patched. You'll need do the assert on the Mock B object, which is at hdfs_connection_fixture.fs.ls
What to Patch
If you change your import in mymodule.py
to this from pyarrow.hdfs import connect
your patch will stop working.
Why is that?
When you patch something you're changing what a name
points to, not the actual object.
Your current patch is patching the name pyarrow.hdfs.connect
and in mymodule you're using the same name pyarrow.hdfs.connect
so everything is fine.
However, if you use from pyarrow.hdfs import connect
mymodule will have imported the real pyarrow.hdfs.connect
and created a reference for it with the name mymodule.connect
.
So when you call connect
inside mymodule
you're accessing the name mymodule.connect
, which is not patched.
That is why you would need to patch mymodule.connect
when using from import.
I'd recommend using from x import y
when doing this kind of patching. It makes it more explicit what you're trying to mock and the patch will be limited to that module only, which can prevent unforeseen side-effects.
Source, this section in the Python documentation: Where to patch
To understand how patching works in python let's first understand the import statement.
When we use import pyarrow
in a module (mymodule.py in this case) it does two operations :
pyarrow
module in sys.modules
pyarrow
) in the local scope.
By doing something like: pyarrow = sys.modules['pyarrow']
NOTE: import
statements in python doesn't execute code. The import statement brings a name into local scope. The execution of code happens as a side-effect only when python can't find a module in sys.modules
So, to patch pyarrow imported in mymodule.py we need to patch the pyarrow
name present in the local scope of mymodule.py
patch('mymodule.pyarrow', autospec=True)
test_module.py
import pytest
from mock import Mock, sentinel
from pyarrow import hdfs
from mymodule import HdfsSearch
class TestHdfsSearch(object):
@pytest.fixture(autouse=True, scope='function')
def setup(self, mocker):
self.hdfs_mock = Mock(name='HadoopFileSystem', spec=hdfs.HadoopFileSystem)
self.connect_mock = mocker.patch("mymodule.pyarrow.hdfs.connect", return_value=self.hdfs_mock)
def test_initialize_HdfsSearch_should_connect_pyarrow_hdfs_file_system(self):
HdfsSearch()
self.connect_mock.assert_called_once_with(driver="libhdfs")
def test_initialize_HdfsSearch_should_set_pyarrow_hdfs_as_file_system(self):
hdfs_search = HdfsSearch()
assert self.hdfs_mock == hdfs_search.fs
def test_search_should_retrieve_directory_contents(self):
hdfs_search = HdfsSearch()
self.hdfs_mock.ls.return_value = sentinel.contents
result = hdfs_search.search(".")
self.hdfs_mock.ls.assert_called_once_with(path=".")
assert sentinel.contents == result
Use context managers to patch built-ins
def test_patch_built_ins():
with patch('os.curdir') as curdir_mock: # curdir_mock lives only inside with block. Doesn't lives outside
assert curdir_mock == os.curdir
assert os.curdir == '.'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With