Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exporting/Importing a hierarchical graph from a database

I have a basic db schema comprising 2 tables; One is a simple ID -> Text list of terms, and the other has 2 columns, parent and child. The ids in the first table are generated on insert by a db sequence while the second table contains a mapping between keys to store the 'structure' of the hierarchy.

My problem is that I may want to sometimes move a tree from one db to another. If I have 2 DBs, each with 10 terms in (Database A's terms != Database B's terms, and there's no overlap), and I just copy the data from A to B then I'll get an obvious problem that the terms will be renumbered but the relationships wont. Clearly in this example just adding 10 to all the relationship keys will work, but does anyone know of a general algorithm to do this?

The DB is oracle 11g, and an oracle specific solution is fine...

like image 394
PaulJWilliams Avatar asked Jan 26 '10 15:01

PaulJWilliams


1 Answers

Quick answer

Import into a staging table, but populate mapped ID values from the same sequence used to produce ID values from the destination table. This is guaranteed to avoid conflicts between ID values as DBMS engine supports concurrent access to sequences.

With the ID values on the node mapped (see below) re-mapping the ID values for the edges is trivial.

Longer answer

You will need a mechanism that maps the values between the old keys from the source and new keys in the destination. The way to do this is to create intermediate staging tables that hold the mappings between the old and new kays.

In Oracle, autoincrementing keys are usually done with sequences in much the way you've described. You need to construct staging tables with a placeholder for the 'old' key so you can do the re-mapping. Use the same sequence as used by the application to populate the ID values on actual destination database tables. The DBMS allows concurrent accesses to sequences and using the same sequence guarantees that you will not get collisions in the mapped ID values.

If you have a schema like:

create table STAGE_NODE (
       ID int
      ,STAGED_ID int
)
/

create table STAGE_EDGE (
       FROM_ID   int
      ,TO_ID     int
      ,OLD_FROM_ID int
      ,OLD_TO_ID int
)
/

This will allow you to import into the STAGE_NODE table, preserving the imported key values. The insert process puts the original ID from the imported table into STAGED_ID and populates ID from the sequence.

Make sure you use the same sequence that's used for populating the ID column in the destination table. This ensures that you won't get key collisions when you go to insert to the final destination table. It is important to re-use the same sequence.

As a useful side effect this will also allow the import to run while other operations are taking place on the table; concurrent reads on a single sequence are fine. If necessary you can run this type of import process without bringing down the applciation.

Once you have this mapping in the staging table, ID values in the EDGE table are trivial to compute with a query like:

select node1.ID         as FROM_ID
      ,node2.ID         as TO_ID
  from STAGE_EDGE se
  join STAGE_NODE node1
    on node1.STAGED_ID = se.OLD_FROM_ID
  join STAGE_NODE node2
    on node2.STAGED_ID = se.OLD_TO_ID 

The mapped EDGE values can be populated back into the staging tables using an UPDATE query with a similar join or inserted directly into the destination table from a query similar to the one above.

like image 192
ConcernedOfTunbridgeWells Avatar answered Nov 08 '22 13:11

ConcernedOfTunbridgeWells