I'd like some ideas about how I should test some objects that can block, waiting for another participant. The specific unit to be tested is the channel between the participants, The the participants themselves are mock fixtures for the purposes of the tests.
It would be nice to validate that the participants do deadlock when they are expected to, but this is not terribly important to me, since what happens after the deadlock can reasonably be described as undefined.
More critical would be to verify that the defined interactions from the participants do not deadlock.
In either case, I'm not really sure what the optimal testing strategy should be. My current notion is to have the test runner fire off a thread for each participant, sleep for a while, then discover if the child threads have returned. In the case they have not returned in time, assume that they have deadlocked, and safely terminate the threads, and the test fails (or succeeds if the deadlock was expected).
This feels a bit probabalistic, since there could be all sorts of reasons (however unlikely) that a thread might take longer than expected to complete. Are there any other, good ways of approaching this problem?
EDIT: I'm sure a soundness in testing would be nice, but I don't think I need to have it. I'm thinking in terms of three levels of testing certainty.
The first of course is a valuable test to pass, but ShiDoiSi's answer speaks to the impracticality of that. The second one is significantly weaker than the first, but still hard; How can you establish that a network of processes has actually deadlocked? I'm not sure that's any easier to prove than the first (maybe a lot harder)
The last one is more like what I have in mind.
The only way to reliably test for deadlocks is to instrument the locking subsystem to detect and report them. The last time I had to do this, we built a debug version of it that recorded which threads held which locks and checked for potential deadlocks on every lock-obtain call. It can be a heavyweight operation in a system with a lot of locking going on, but we found it to be so valuable that we reorganized the subsystem so we could turn it on and off with a switch at runtime, even in production builds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With