Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging databases how to handle duplicate PK's

We have three databases that are physically separated by region, one in LA, SF and NY. All the databases share the same schema but contain data specific to their region. We're looking to merge these databases into one and mirror it. We need to preserve the data for each region but merge them into one db. This presents quite a few issues for us, for example we will certainly have duplicate Primary Keys, and Foreign Keys will be potentially invalid.

I'm hoping to find someone who has had experience with a task like this who could provide some tips, strategies and words of experience on how we can accomplish the merge.

For example, one idea was to create composite keys and then change our code and sprocs to find the data via the composite key (region/original pk). But this requires us to change all of our code and sprocs.

Another idea was to just import the data and let it generate new PK's and then update all the FK references to the new PK. This way we potentially don't have to change any code.

Any experience is welcome!

like image 276
Denny Ferrassoli Avatar asked Jul 07 '09 18:07

Denny Ferrassoli


2 Answers

BEST: add a column for RegionCode, and include it on your PKs, but you don't want to do all the leg work.

HACK: if your IDs are INTs, a quick fix would be to add a fixed value based on region to each key on import. INTs can be as large as: 2,147,483,647

local server data:

LA IDs: 1,2,3,4,5,6
SF IDs: 1,2,3,4,5
NY IDs: 1,2,3,4,5,6,7,9

add 100000000 to LA's IDs
add 200000000 to SF's IDs
add 300000000 to NY's IDs

combined server data:

LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006
SF IDs: 200000001,200000002,200000003,200000004,200000005
NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009
like image 27
KM. Avatar answered Sep 29 '22 00:09

KM.


I have no first-hand experience with this, but it seems to me like you ought to be able to uniquely map PK -> New PK for each server. For instance, generate new PKs such that data from LA server has PK % 3 == 2, SF has PK % 3 == 1, and NY has PK % 3 == 0. And since, as I understood your question anyway, each server only stores FK relationships to its own data, you can update the FKs in identical fashion.

NewLA = OldLA*3-1
NewSF = OldLA*3-2
NewNY = OldLA*3

You can then merge those and have no duplicate PKs. This is essentially, as you already said, just generating new PKs, but structuring it this way allows you to trivially update your FKs (assuming, as I did, that the data on each server is isolated). Good luck.

like image 104
Nick Lewis Avatar answered Sep 28 '22 23:09

Nick Lewis