Right now, the devs all have a their local dev environments with a snapshot of the production database - which they can twist, churn and beat up the data without affecting anyone but themselves.
These snapshots are starting to get large, and a data import of them is starting to take close to an hour.
Any better recommendations at maintaining dev data? The dev data can be ripped apart for potential changes, and then need to be put back together if a change idea was bad, etc.
I try to use the following approach:
Developers maintain a baseline script which is in version control and sets up the database schema from scratch. It creates the schema just as it exists in the production database.
They also maintain a 'script' to setup test data. This 'script' uses actually production classes and sometimes a little DSL on top of that. In order to be reasonable fast the script generates only minimal testdata. I recommend making it part of the definition of done to create some testdate for any new feature build.
Developers can run these scripts at will on their database (or database schema). The first script is also used as a basis for running automatic database tests.
Result of any work done by the developers is a migration script. i.e. a script that can be applied to the production database to bring it to the new desired state, including updates to data.
These migrations can be tested on snapshots of the production database. Snapshots of the production database are also used to run load and performance tests.
Only for the snapshots I use database specific tools. Mostly everything else is written in the main programming language (java for me) so the developers feel comfortable using it.
I often encounter resistance to this approache ("too many scripts", "too many databases", "I don't want to use version control, because my db modelling tool doesn't support it"). But appart from loads of manual works I don't really see an alternative.
In my experience, having a centralized DB+data for each environment: Development, Testing+Integration and Production has been the best approach.
Usually, as a developer, you want a few things from the dev database set up.
You want it to be easy to work with - it should be straightforward to make changes, keep those changes versioned, and apply them to other environments.
You want to have representative data - and have that data be predictable. For instance, if you're building an invoicing system, you want clients with known credit limits so you can write test cases to track what happens to them as issue an invoice, have it paid etc. (Integration tests, rather than unit tests).
You want to be able to query against representative data volumes so performance issues arise in dev as well as production.
You never, ever want to be able to affect "real" data - for instance, you want email addresses and names to be anonymous, you want passwords to be re-set.
Continuous Database Integration offers a solution to most of this - and also solves the "it takes an hour to set up a database for a development environment" issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With