Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixtures and Selenium and Rails (oh my?)

What data do you use with Selenium tests on Rails apps? Do you load from fixtures? Use an existing dev db? Use a separate (non-fixture) db?

I'm considering my options here. I have a Rails app with a large Selenium test suite that runs on a modified version of Selenium Grid. Part of the process, right now, is loading a large set of fixtures, once, before the test suite runs. It's a LOT of data. Most of it is reporting info exported from our production db. When I set it up originally, I exported the data to yaml from Oracle.

Now there's been a schema change in some of the reporting tables, so of course I have to regenerate the fixture data. There is so much of it that it's not worthwhile to edit the files by hand. But it seems inefficient to have to regenerate for every little schema change - not to mention that it's yet another step to remember to do. Is there a better way?

EDIT: I originally intended to load the fixtures before each test and unload them after each test, like regular Rails tests. But it takes about 15 minutes to load the fixtures due to this reporting data. There are 200+ tests, and the suite runs every 12 hours. I cannae bend spacetime captain!

EDIT 2: I also agree that having this big set of fixtures is a bad smell. I'm not sure how to pare it down, though, because the reports aggregate a lot of data and much of the value of the selenium tests is that they test the reports.

Even if it's a small set of data, though...it's still another set to keep co-ordinated with schema changes. (We have a separate, smaller set for unit, functional, and [Rails] integration tests.)

Which brings me back to my original question - are there other options besides doing it by hand, or remembering to regenerate them each time?

like image 559
Sarah Mei Avatar asked Mar 13 '09 19:03

Sarah Mei


3 Answers

If you can, the best possible thing to do is have a system in which each Selenium test gets it's own data state (ie: DB tables dropped and recreated, bootstrap data re-inserted, and caches cleared). This is easier said than done and usually is only possible if the project planned for it from the start.

The next best thing is to have a consistent DB state for each test suite/run. This is not as nice since there is now a strong chance that some tests will depend on the success of previously run tests, making it more difficult identify true failures vs. false negatives.

The worst case, IMO, is to use a static DB in which each test run mutates the date. This almost always leads to problems and is usually a "project smell". The key to doing it the "right way" (again, IMO) is to be vigilant about any state/schema change and capture it as part of the automated test/build process.

Rails does a good job with this already with Migrations, so take advantage of them! Without knowing your situation, I'd generally question the need to run Selenium tests against a snap of the full DB. Most DBs can (or should) be distilled down to less than 1MB for automated testing, making automated schema migrations and data reset much more efficient.

The only time I've seen a "valid" reason for massive DBs for Selenium tests is when the DB itself contains large chunks of "logic data" in which the data affects the application flow (think: data-driven UI).

like image 151
Patrick Lightbody Avatar answered Nov 11 '22 15:11

Patrick Lightbody


I think you're asking two questions here that are intertwined so if I'm to break it down:

  • You want to get test data into and out of your DB quickly and fixtures aren't doing it for you.
  • You've been burnt by a schema change and you want to make sure that whatever you do doesn't require eight iterations themed "fiddling with the test data...still" :)

You've got a couple of alternatives here which I've hashed out below. Because you've mentioned Oracle I'm using Oracle technologies here but the same thing is true for other DB platforms (e.g. Postgresql):

  1. Rake tesks that call PL/SQL scripts to generate the data, nasty horrible evil idea, don't do it unless there's no other option. I did it on one project that needed to load in billions of rows for some infrastructure architecture tests. I still sulk about it.
  2. Get your DB into a dump format. For speedy binary dumps check out the exp/imp and data pump utilities. This will allow you quick setup and teardown of your DB. Certainly on a rails project I worked on we used rake tasks to exp/imp a database which had around 300k records in under a minute. Also check SQLLoader which is the logical dump utility, as its logical its slower and requires you to have control scripts to help SQLLoader understand the dumps. However, the benefit of the logical dump is that you can run transformation scripts over them to massage the data into the latest format. Sadly though just like fixtures all these tools are pretty sensitive to change in the schema.
  3. Use a plugin such as Machinist or Factory Girl to make the generation of the data nicer. You still incur the penalty of using ActiveRecord to setup the DB but these fake object generators will help you stay close to you migrations and are a lot less hassle to maintain than fixtures.
  4. Combine approaches 2 and 3. What happens here is that you make some test data with say Machinst. You export that test data to a dump and then reload the dump during each test run. When the schema changes update the Machinist config and re-export.

Hope that helps.

like image 33
robertpostill Avatar answered Nov 11 '22 15:11

robertpostill


I'm currently on a project with an enormous Selenium test suite--actually, the one Selenium Grid was written for--and our tests use a small amount of reference data (though we don't use Rails YAML fixtures) and object factories for one-off data needed for particular tests.

Alternatively, on many of the ThoughtWorks Rails projects I've been on we've written checkin scripts that incorporate a number of pre-commit hooks--for example, running the tests before allowing a commit. One thing you might consider trying is writing (or customizing) a similar checkin script that will check for schema changes and reload the reference data as needed.

See e.g. Paul Gross's rake commit tasks on Github.

like image 1
Brian Guthrie Avatar answered Nov 11 '22 15:11

Brian Guthrie