Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large test dataset in repository

Suppose I'm developing some open-source software, written in an interpreted language, managed as a Git repo, that requires a large dataset (+300 MB) for basic testing.

Should the test data go into the same repository as the source code, with a compileToZipFile.sh script for publishing releases? Would it be better to store as two separate repositories, a srcRepo and testRepo ?

Any best practices/conventions would be appreciated.

like image 376
supyo Avatar asked Jun 03 '13 18:06

supyo


1 Answers

I suppose the best answer to this question would be based off of the need.

At my work, we segregate our code/test data by environment type like:

  • Test
  • QA
  • Staging
  • Production

Certain environments have the same data as production, while others have older (or completely different) data. The benefits of this are:

  • Sandboxes to test, implement and 'play' with new ideas/technologies.
  • You aren't affecting the live, customer-facing data.
  • Integrated tests can be catered to/focused on certain aspects that are agnostic to the main code base.

Now, as to your questions... as I mentioned above, the segregation of data allows for us to rapidly make changes and implement new features since the data we use is focused on what we are testing. We have three trunks that all have independent test data that is specific to what needs to be tested. When testing the View we have a set of tests, when testing the Model we have another set of tests and when testing the Controller we have yet another set of tests. Lastly, we have an over-arching set of integration tests that run when a new build is released. In all cases but the last one, the tests live with the component they were created for; but again, since they are integration tests, it makes sense that they are kept separately of the three pieces they verify.

I think your idea is a solid one.

like image 121
Brian Avatar answered Oct 31 '22 20:10

Brian