How do I test that my program is robust to unexpected shut-downs?
My python code will run on a microcontroller that shuts off unexpectedly. I would like to test each part of the code rebooting unexpectedly and verify that it handles this correctly.
Attempt: I tried putting code into its own process, then terminating it early, but this doesn't work because MyClass calls 7zip from the command line which continues even after process dies:
import multiprocessing
import os
def MyClass(multiprocessing.Process):
...
def run():
os.system("7z a myfile.7z myfile")
process = MyClass()
process.start()
time.sleep(4)
print("terminating early")
process.terminate()
print("done")
What I want:
class TestMyClass(unittest.TestCase):
def test_MyClass_continuity(self):
myclass = MyClass().start()
myclass.kill_everything()
myclass = MyClass().start()
self.assert_everything_worked_as_expected()
Is there an easy way to do this? If not, how do you design robust code that could terminate at any point (e.g. testing state machines)?
Similar question (unanswered as of 26/10/21): Simulating abnormal termination in pytest
Thanks a lot!
This typically leads to hours of long full system tests to ensure repeatability which is not a scalable solution when hardware is involved due to slow responses. For Python developers, the solution is to write unit tests of the test code using pytest and the pytest-mock plugin to simulate hardware responses.
In order to run the test functions, remain in the same directory, and run the `pytest`, `py.test`, `py.test test_func.py` or `pytest test_func.py`. In the output, you will see all that the test cases are passed successfully. Use `py.test -v` to see the detailed output of each test case.
1 After activating the virtual environment, it is time to install pytest in our directory that we made above. 2 Run: `pip install -U pytest` or `pip install pytest` (make sure that the pip version should be the latest). More ...
Here is a conftest.py file adding a --runslow command line option to control skipping of pytest.mark.slow marked tests: If you have a test helper function called from a test you can use the pytest.fail marker to fail a test with a certain message.
Your logic starts a process wrapped within the MyClass
object which itself spawns a new process via the os.system
call.
When you terminate the MyClass
process, you kill the parent process but you leave the 7zip
process running as orphan.
Moreover, the process.terminate
method sends a SIGTERM
signal to the child process. The child process can intercept said signal and perform some cleanup routines before terminating. This is not ideal if you want to simulate a situation where there is no chance to clean up (a power loss). You most likely want to send a SIGKILL
signal instead (on Linux).
To kill the parent and child process, you need to address the entire process group.
import os
import time
import signal
import multiprocessing
class MyClass(multiprocessing.Process):
def run(self):
# Ping localhost for a limited amount of time
os.system("ping -c 12 127.0.0.1")
process = MyClass()
process.start()
time.sleep(4)
print("terminating early")
# Send SIGKILL signal to the entire process group
group_id = os.getpgid(process.pid)
os.killpg(group_id, signal.SIGKILL)
print("done")
The above works only on Unix OSes and not on Windows ones.
For Windows, you need to use the psutil module.
import os
import time
import multiprocessing
import psutil
class MyClass(multiprocessing.Process):
def run(self):
# Ping localhost for a limited amount of time
os.system("ping -c 12 127.0.0.1")
def kill_process_group(pid):
process = psutil.Process(pid)
children = process.children(recursive=True)
# First terminate all children
for child in children:
child.kill()
psutil.wait_procs(children)
# Then terminate the parent process
process.kill()
process.wait()
process = MyClass()
process.start()
time.sleep(4)
print("terminating early")
kill_process_group(process.pid)
print("done")
I think this is a question of data persistence and consistency. You need to make sure all data that is persistent (i.e. written to disk) is consistent, too.
Imagine some sort of data written to a status file. What will be read by the application after an unexpected termination? Half of the new status and half of the previous one? Half of the new status and the rest all 0x00?
So the answer to your question "How do you design robust code that could terminate at any point?" is to use atomic operations when working with persistent data. Most databases give some guarantees in that direction. And for working with local files I personally usually work using renaming files. This way I can write to a temporary file without worrying about consistency at all and only when that is done (be sure to flush the buffers!) and therefore consistent I use the atomic operation of a rename to make the temporary file the new single point of truth and therefore also persistent. If at any point in the process the application terminates unexpectedly the persistent data will alway be consistent. It will either be the previous state (and some garbage within a temporary file) or the new state, but nothing in between.
Whatever your choice is, be sure to read the documentation about the atomicity to understand what could happen. I.e. a file rename interrupted at the right point in time could look like the creation of a hard-link.
Note that just killing a process is not the same as cutting the power, because the OS keeps running and closes files, flushes buffers etc. For example when using SQLite I rarely see "journal files" when just killing the application, but I see them quite often when cutting the power.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With