Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

recommendation for maintaining dev database

Right now, the devs all have a their local dev environments with a snapshot of the production database - which they can twist, churn and beat up the data without affecting anyone but themselves.

These snapshots are starting to get large, and a data import of them is starting to take close to an hour.

Any better recommendations at maintaining dev data? The dev data can be ripped apart for potential changes, and then need to be put back together if a change idea was bad, etc.

like image 434
cgmckeever Avatar asked Aug 14 '12 14:08

cgmckeever


3 Answers

I try to use the following approach:

Developers maintain a baseline script which is in version control and sets up the database schema from scratch. It creates the schema just as it exists in the production database.

They also maintain a 'script' to setup test data. This 'script' uses actually production classes and sometimes a little DSL on top of that. In order to be reasonable fast the script generates only minimal testdata. I recommend making it part of the definition of done to create some testdate for any new feature build.

Developers can run these scripts at will on their database (or database schema). The first script is also used as a basis for running automatic database tests.

Result of any work done by the developers is a migration script. i.e. a script that can be applied to the production database to bring it to the new desired state, including updates to data.

These migrations can be tested on snapshots of the production database. Snapshots of the production database are also used to run load and performance tests.

Only for the snapshots I use database specific tools. Mostly everything else is written in the main programming language (java for me) so the developers feel comfortable using it.

I often encounter resistance to this approache ("too many scripts", "too many databases", "I don't want to use version control, because my db modelling tool doesn't support it"). But appart from loads of manual works I don't really see an alternative.

like image 117
Jens Schauder Avatar answered Nov 17 '22 15:11

Jens Schauder


In my experience, having a centralized DB+data for each environment: Development, Testing+Integration and Production has been the best approach.

  • Development: let the developers do whatever they want with it. If production-like data is required, obfuscate/remove sensitive data. The more lightweight this database is, the better for you to move, maintain and backup.
  • Testing: use it to simulate the production environment and let the testers to input/retrieve all the data the want but only through your application interfaces. This environment also allows you to test your deployments before sending them to production, you don't want a bad DB installer to leave the production app in an unusable state. If required, you can input this environment with production data but obfuscate/remove sensitive data too. You could use high volumes to spot performance issues before they get to production.
  • Production: Leave your production data/environment alone, you don't want sensitive data to end up in the wrong hands or a DB error configuration to allow the developers to change data accidentally.
like image 28
davidmontoyago Avatar answered Nov 17 '22 16:11

davidmontoyago


Usually, as a developer, you want a few things from the dev database set up.

You want it to be easy to work with - it should be straightforward to make changes, keep those changes versioned, and apply them to other environments.

You want to have representative data - and have that data be predictable. For instance, if you're building an invoicing system, you want clients with known credit limits so you can write test cases to track what happens to them as issue an invoice, have it paid etc. (Integration tests, rather than unit tests).

You want to be able to query against representative data volumes so performance issues arise in dev as well as production.

You never, ever want to be able to affect "real" data - for instance, you want email addresses and names to be anonymous, you want passwords to be re-set.

Continuous Database Integration offers a solution to most of this - and also solves the "it takes an hour to set up a database for a development environment" issue.

like image 44
Neville Kuyt Avatar answered Nov 17 '22 16:11

Neville Kuyt