Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cutting a database down to size

Tags:

sql

database

Say you have a database that has served a company for 10 years. It is 500GB in size, it has myriad tables, stored procedures and triggers.

Now say you wish to create a cut down version of the database to use as a test bed for use in integration testing and for individual testers and developers to spin up instances of to play around with.

In broad terms how would you set about this task?

In case it matters, the database I have in mind is SQL Server 2008.

Edit: removed "unit testing" because of course unit tests should not test db integration

like image 770
Ben Aston Avatar asked Feb 02 '23 19:02

Ben Aston


1 Answers

If your tables all consisted of unrelated data, you could just pick X random records from each table. I'm guessing that the problem is that the tables are NOT unrelated, so if, say, table A includes a foreign key reference to table B and you just pulled 10% of the records from table A and 10% of the records from table B, you'd have a whole bunch of invalid references from A to B.

I don't know of a general solution to this problem. It depends on the exact structure of your database. I often find that my databases consist of a small number of "central" tables that have lots of references from other tables. That is, I generally find that I have, say, an Order table, and then there's an Order Line table that points to Order, and a Customer table that Order points to, and a Delivery table that points to Order or maybe Order Line, etc, but everything seems to center around "Order". In that case, you could randomly pick some number of Order records, then find all the Customers for those Orders, all the Order Lines for those Orders, etc. I usually also have some number of "code lookup" tables, like a list of all the "order status" codes, another list of all the "customer type" codes, etc. These are usually small, so I just copy them entirely.

If your database is more ... disjointed ... than that, i.e. if it doesn't have any clear centers but is a maze of interrelationships, this could be much more complicated. I think the same principle would apply, though. Pick SOME starting point, select some records from there, then get all the records connected to those records, etc.

like image 86
Jay Avatar answered Feb 05 '23 17:02

Jay