Suppose I'm developing some open-source software, written in an interpreted language, managed as a Git repo, that requires a large dataset (+300 MB) for basic testing.
Should the test data go into the same repository as the source code, with a compileToZipFile.sh
script for publishing releases? Would it be better to store as two separate repositories, a srcRepo and testRepo ?
Any best practices/conventions would be appreciated.
I suppose the best answer to this question would be based off of the need.
At my work, we segregate our code/test data by environment type like:
Certain environments have the same data as production, while others have older (or completely different) data. The benefits of this are:
Now, as to your questions... as I mentioned above, the segregation of data allows for us to rapidly make changes and implement new features since the data we use is focused on what we are testing. We have three trunks that all have independent test data that is specific to what needs to be tested. When testing the View
we have a set of tests, when testing the Model
we have another set of tests and when testing the Controller
we have yet another set of tests. Lastly, we have an over-arching set of integration tests that run when a new build is released. In all cases but the last one, the tests live with the component they were created for; but again, since they are integration tests, it makes sense that they are kept separately of the three pieces they verify.
I think your idea is a solid one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With