How can I diagnose the cause of an intermittent Rspec failure (that doesn't depend on sequence)?

Question

I'm aware of rspec --bisect for isolating specs that fail intermittently based on sequence of examples run. But I have a case in a legacy app (I've never worked in this codebase before) where there are specs failing intermittently even when run in isolation—so sequence is not a factor.

In other words, if I run

rspec ./spec/services/locate_missing_approver_service_spec.rb:5

it will fail perhaps 1 time in 10. That 10% figure is a guess based on my experience of running this, but it might be useful if I had a way of accurately gathering statistics like that, so I could use git bisect to determine where the intermittent failures begin.

At the moment I'm manually running a function to check whether the failures occur in any given commit or with any temporarily changed code, and I could extend it to gather stats, but I'm not sure if that is the best path to go down.

function test-flaky-spec() {
  while true; do
    rspec ./spec/services/locate_missing_approver_service_spec.rb:5
    if [[ $? -ne 0 ]]; then
      break
    fi
    sleep 0.1
  done
}

So the question really boils down to:

Are there any generally accepted troubleshooting techniques for this sort of scenario?

aridlehoover · Accepted Answer

This question was asked in the context of RSpec. This answer is broader than that. It should apply to any suite of automated tests in any language.

A flaky test is a test that fails inconsistently. The primary cause for flakiness is a dependency on something in the environment that changes from one run to another. Here are three ways that the environment might change between test runs:

Non-determinism
Leakiness
Race conditions

Non-determinism

Tests sometimes rely on a non-deterministic part of the environment, like the system clock, a random number, or access to a network resource. Asynchronous code (like file I/O) is also non-deterministic and can cause failures.

The good thing about this kind of flakiness is that you can reproduce it locally and in isolation. So, the only thing you need to do is find the thing that is changing in the environment and mock it. For example:

If you are using the system clock, freeze time during your test to ensure a deterministic result.
If you are using a random number generator, mock it and return just the value you need in the current test case.
If you are using the network, don't. Mock the network call and return a payload matched to the running test.
If you are testing asynchronous code, don't. Force it to be synchronous when testing it.

Leakiness

A leaky test is one that modifies some global state, then fails to clean up after itself. After it runs, all subsequent tests are starting from an unpredictable environment, which may cause some of them to fail. Some kinds of global state to watch out for are environment variables, class variables, and global data stores, like memcache, redis, or a database.

Leaky tests are harder to pin down. They can be reproduced locally, but because they are order dependent, they will not fail when run in isolation. The key to debugging these failures is to inspect the state of the environment at the beginning of the test to ensure that it matches your expectations.

The key to permanently resolving these failures lies in finding out which test(s) are not cleaning up after themselves. Some tools (like RSpec) have the ability to "bisect" a test suite to determine which specific tests run in what specific order will cause a downstream test to fail consistently. This is a great help, but doesn't always lead to a definitive answer. It could be the combination of multiple leaky tests that cause a downstream test to fail.

Race Conditions

A race condition test is one that leverages a shared resource during execution. So, when run in parallel with other tests (typically in CI), any of them that rely on this shared resource might have the values they expect changed by another test while both execute in parallel.

This kind of failure is the hardest to reproduce. Typically, you wouldn't run tests in parallel locally. And, because of their parallel nature, you won't be able to reproduce these issues in isolation. Once you see a failure though, look for where the test might be using some globally available shared resource and find a way to stop using it. For example, try giving each test its own in-memory implementation of the shared resource.

Summary

Flakiness can almost always be traced back to tests depending on something in their environment that changes unexpectedly. Here are a couple of best practices for avoiding flakiness:

Mock non-deterministic system resources
Clean up after yourself
Avoid the use of global state
Favor one-off in-memory stores over persistent shared resources

Also, use this guide to determine what kind of flakiness to look for:

If you can reproduce locally, and in isolation, you're likely dealing with a non-deterministic test.
If you can reproduce locally, but not in isolation, you're likely dealing with a leaky test. If your tooling supports bisect, leverage that to help identify which tests are leaking state.
And, if you cannot reproduce locally and the issue only occurs in CI, then you're likely dealing with a race condition.

How can I diagnose the cause of an intermittent Rspec failure (that doesn't depend on sequence)?

Tags:

rspec

iconoclast

1 Answers

Non-determinism

Leakiness

Race Conditions

Summary

aridlehoover

Recent Activity

Donate For Us

How can I diagnose the cause of an intermittent Rspec failure (that doesn't depend on sequence)?

Tags:

rspec

iconoclast

1 Answers

Non-determinism

Leakiness

Race Conditions

Summary

aridlehoover

Related questions

Recent Activity

Donate For Us