I am having what I believe to be a common issue in using mock patching in that I can not figure out the right thing to patch.
I have two questions that I am hoping for help with.
An example using pyarrow that is currently causing me pain:
import pyarrow
class HdfsSearch:
def __init__(self):
self.fs = self._connect()
def _connect(self) -> object:
return pyarrow.hdfs.connect(driver="libhdfs")
def search(self, path: str):
return self.fs.ls(path=path)
import pyarrow
import pytest
from mymodule import HdfsSearch
@pytest.fixture()
def hdfs_connection_fixture(mocker):
mocker.patch("pyarrow.hdfs.connect")
yield HdfsSearch()
def test_hdfs_connection(hdfs_connection_fixture):
pyarrow.hdfs.connect.assert_called_once() # <-- succeeds
def test_hdfs_search(hdfs_connection_fixture):
hdfs_connection_fixture.search(".")
pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once() # <-- fails
$ python -m pytest --verbose test_module.py
=========================================================================================================== test session starts ============================================================================================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /home/bbaur/miniconda3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /home/user1/work/app
plugins: cov-2.7.1, mock-1.10.4
collected 2 items
test_module.py::test_hdfs_connection PASSED [ 50%]
test_module.py::test_hdfs_search FAILED [100%]
================================================================================================================= FAILURES =================================================================================================================
_____________________________________________________________________________________________________________ test_hdfs_search _____________________________________________________________________________________________________________
hdfs_connection_fixture = <mymodule.HdfsSearch object at 0x7fdb4ec2a610>
def test_hdfs_search(hdfs_connection_fixture):
hdfs_connection_fixture.search(".")
> pyarrow.hdfs.HadoopFileSystem.ls.assert_called_once()
E AttributeError: 'function' object has no attribute 'assert_called_once'
test_module.py:16: AttributeError
You're not calling the assert on the Mock object, this is the correct assert:
hdfs_connection_fixture.fs.ls.assert_called_once()
Explanation:
When you access any attribute in a Mock object it will return another Mock object.
Since you patched "pyarrow.hdfs.connect" you've replaced it with a Mock, let's call it Mock A. Your _connect method will return that Mock A and you'll assign it to self.fs.
Now let's break down what's happening in the search method when you call self.fs.ls.
self.fs returns your Mock A object, then the .ls will return a different Mock object, let's call it Mock B. In this Mock B object you're doing the call passing (path=path).
In your assert you're trying to access pyarrow.hdfs.HadoopFileSystem, but it was never patched. You'll need do the assert on the Mock B object, which is at hdfs_connection_fixture.fs.ls
What to Patch
If you change your import in mymodule.py to this from pyarrow.hdfs import connect your patch will stop working.
Why is that?
When you patch something you're changing what a name points to, not the actual object.
Your current patch is patching the name pyarrow.hdfs.connect and in mymodule you're using the same name pyarrow.hdfs.connect so everything is fine.
However, if you use from pyarrow.hdfs import connect mymodule will have imported the real pyarrow.hdfs.connect and created a reference for it with the name mymodule.connect.
So when you call connect inside mymodule you're accessing the name mymodule.connect, which is not patched.
That is why you would need to patch mymodule.connect when using from import.
I'd recommend using from x import y when doing this kind of patching. It makes it more explicit what you're trying to mock and the patch will be limited to that module only, which can prevent unforeseen side-effects.
Source, this section in the Python documentation: Where to patch
To understand how patching works in python let's first understand the import statement.
When we use import pyarrow in a module (mymodule.py in this case) it does two operations :
pyarrow module in sys.modules
pyarrow) in the local scope.
By doing something like: pyarrow = sys.modules['pyarrow']
NOTE: import statements in python doesn't execute code. The import statement brings a name into local scope. The execution of code happens as a side-effect only when python can't find a module in sys.modules
So, to patch pyarrow imported in mymodule.py we need to patch the pyarrow name present in the local scope of mymodule.py
patch('mymodule.pyarrow', autospec=True)
test_module.py
import pytest
from mock import Mock, sentinel
from pyarrow import hdfs
from mymodule import HdfsSearch
class TestHdfsSearch(object):
@pytest.fixture(autouse=True, scope='function')
def setup(self, mocker):
self.hdfs_mock = Mock(name='HadoopFileSystem', spec=hdfs.HadoopFileSystem)
self.connect_mock = mocker.patch("mymodule.pyarrow.hdfs.connect", return_value=self.hdfs_mock)
def test_initialize_HdfsSearch_should_connect_pyarrow_hdfs_file_system(self):
HdfsSearch()
self.connect_mock.assert_called_once_with(driver="libhdfs")
def test_initialize_HdfsSearch_should_set_pyarrow_hdfs_as_file_system(self):
hdfs_search = HdfsSearch()
assert self.hdfs_mock == hdfs_search.fs
def test_search_should_retrieve_directory_contents(self):
hdfs_search = HdfsSearch()
self.hdfs_mock.ls.return_value = sentinel.contents
result = hdfs_search.search(".")
self.hdfs_mock.ls.assert_called_once_with(path=".")
assert sentinel.contents == result
Use context managers to patch built-ins
def test_patch_built_ins():
with patch('os.curdir') as curdir_mock: # curdir_mock lives only inside with block. Doesn't lives outside
assert curdir_mock == os.curdir
assert os.curdir == '.'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With