What is a systematic approach to debug intermittently failing specs?

Question

I have four tests in my Capybara/Rspec suite that keep failing (a real problem for CI deployment).

The worst thing, these tests are failing intermittently, and often only when the entire suite is run, making it difficult to debug.

They are all ajax requests, either submitting a remote form or clicking a remote link, followed by expect(page).to have_content 'My Flash Message'.

These tests even fail intermittently within the same test cycle. For example, I have several models that behave similarly, so I am iterating through them to test.

e.g., 
['Country', 'State', 'City'].each do |object|
  let(:target) { create object.to_sym }
  it 'runs my frustrating test' do 
  end
end

Sometimes country fails, sometimes state, sometimes everything passes.

I have tried adding wait: 30 to the expect statement. I have tried adding sleep 30 before the expect statement. I'm still getting intermittent passes.

There is quite a bit of information out there describing finicky ajax tests, but I have not found much about how to debug and fix these kinds of problems.

I'm really grateful for any advise or pointers from others, before I pull all my hair out!!

UPDATE

Thank you for all these excellent responses. It's been useful to see that others have grappled with similar issues, and that I'm not alone.

So, is there a solution?

The suggestions to use debugging tools such pry, byebug, Poltergeist's debug feature (thanks @Jay-Ar Polidario, @TomWalpole) have been useful to confirm what I thought I already knew — namely, and as suggested by @BM5K) that the features work consistently in the browser, and the errors lie within the tests.

I experimented with with adjusting timeouts and retries (@Jay-Ar Polidario, @BM5K), and while an improvement these were still not a consistent fix. More importantly, this approach felt like patching holes rather than a proper fix, and so I was not entirely comfortable.

Ultimately I went with a major rewrite of these tests. This has entailed breaking up multi-step features, and setting up and testing each step individually. While purists may claim this is not truly testing from the user's perspective, there is sufficient overlap between each test that I'm comfortable with the result.

In going through this process, I did notice that all of these errors were related to "clicking on things, or filling forms", as @BoraMa suggested. Though in this case the experience was reversed — we had adopted .trigger('click') syntax because capybara + poltergeist was reporting errors clicking on elements using click_link or find(object).click, and it was these tests that were problematic.

To avoid these problems I've removed JS from the tests as much as possible. i.e., testing the majority of the feature without JS enabled, and then creating very short, targeted JS specs to test specific JS responses, features or user feedback.

So there is not really one single fix. A major refactoring that, in all honesty, probably needed to happen and was a valuable exercise. The tests have lost some features by by breaking everything up into individual tests, but as a whole this has made the tests easier to read and maintain.

There are still a couple of tests that are occasionally showing red, and will need some more work. But overall a great improvement.

Thank you all for the great guidance, and reassuring me that interactions in the testing environment could be the root cause.

BM5k · Accepted Answer

Intermittently failing tests are a pain to troubleshoot, but there are some things you can do to make life easier. First would be to remove any looping or shared examples. Explicitly stating each expectation should make it more clear which example combination is failing (or make it even more obvious that it is indeed random).

Over the course of several runs, track which tests are failing. Are they all in the same context group?

Are you mixing and matching javascript tests and non-javascript tests? If you are, you may be running into database issues (I've seen problems caused by switching database cleaner strategies mid context block).

Make sure you consider any parent context blocks the tests are in.

And if none of that narrows down your search, use a gem that allows you to retry failing tests.

I used respec-retry in the past, but have found it to be unreliable lately. I've switched to rspec-repeat. I usually leave these off in development (configured for 1 try) and run with multiple tries on CI (usually 3). That way I can get a feel for which tests are wobbly locally, but not let those tests break my build (unless they fail consistently).

TL;DR

Most of the intermittently failing tests I encounter have a lot of moving pieces (rails, capybara, database cleaner, factory girl, phantomjs, rspec just to name a few). If the code is tested AND the specs frequently pass AND the feature consistently works in the browser chances are some interaction in your testing environment is the root cause of the intermittent failures. If you can't track that down, retry the failing specs a couple of times.

What is a systematic approach to debug intermittently failing specs?

Tags:

ajax

ruby-on-rails

rspec

capybara

UPDATE

Andy Harvey

1 Answers

BM5k

Recent Activity

Donate For Us

What is a systematic approach to debug intermittently failing specs?

Tags:

ajax

ruby-on-rails

rspec

capybara

UPDATE

Andy Harvey

1 Answers

BM5k

Related questions

Recent Activity

Donate For Us