Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split Table into many to many relationship: Data Migration

I am wondering how best to migrate my data when splitting a Table into a many to many relationship. I've made a simplified example and I'll also post some of the solutions I have come up with. I am using a Postgresql Database.

Before Migration

Table Person


ID       Name        Pet        PetName
1        Follett     Cat        Garfield
2        Rowling     Hamster    Furry
3        Martin      Cat        Tom
4        Cage        Cat        Tom

After Migration

Table Person


ID       Name
1        Follett
2        Rowling
3        Martin
4        Cage

Table Pet


ID       Pet        PetName
6        Cat        Garfield
7        Hamster    Furry
8        Cat        Tom
9        Cat        Tom

Table PersonPet


FK_Person     FK_Pet
1             6
2             7
3             8
4             9

Notes:

  • I will specifically duplicate entries in the Pet Table (because in my case - due to other related data - one of them might still be editable by the customer while the other might not).
  • There is no column that uniquely identifies a "Pet" record.
  • For me it does not matter whether 3-8 and 4-9 are linked in the PersonPet table or 3-9 and 4-8.
  • Also I omitted all code that handles the schema changes of the tables as this is - in my understanding - irrelevant for this question.

My Solutions

  1. When creating the Pet Table temporarily add a column containing the id of the Person Table that was used to create this entry.

    ALTER TABLE Pet ADD COLUMN IdPerson INTEGER;

    INSERT INTO Pet (Pet, PetName, IdPerson)
    SELECT Pet, PetName, ID
    FROM Person;

    INSERT INTO PersonPet (FK_Person, FK_Pet)
    SELECT ID, IdPerson
    FROM Pet;

    ALTER TABLE Pet DROP Column IdPerson;
  1. avoid temporarily modifying the Pet table

    INSERT INTO Pet (Pet, PetName)
    SELECT Pet, PetName
    FROM Person;

    WITH
      CTE_Person
      AS
      (SELECT
        Id, Pet, PetName
        ,ROW_NUMBER() OVER (PARTITION BY Pet, PetName ORDER BY Id) AS row_number
      FROM Person
      )
      ,CTE_Pet
      AS
      (SELECT
        Id, Pet, PetName
        ,ROW_NUMBER() OVER (PARTITION BY Pet, PetName ORDER BY Id) AS row_number
      FROM Pet
      )
      ,CTE_Joined
      AS
      (SELECT
        CTE_Person.Id AS Person_Id,
        CTE_Pet.Id AS Pet_Id
      FROM
        CTE_Person
        INNER JOIN CTE_Pet ON
        CTE_Person.Pet = CTE_Pet.Pet
        CTE_Person.PetName = CTE_Pet.PetName
        AND CTE_Person.row_number = CTE_Pet.row_number
      )
      INSERT INTO PersonPet (FK_Person, FK_Pet)
      SELECT Person_Id, Pet_Id from CTE_Joined;

Questions

  1. Are both solutions correct? (I have tested the second solution and the result does seem to be correct, but I might have missed some corner case)
  2. What are the advantages/disadvantages of the two solution?
  3. Is there an easier way of doing the same data migration? (For my curiosity I would also be interested in answers that slightly modify my constraints (e.g. no duplicate entries in the Pet table), but please indicate which ones :) ).
like image 292
taranaki Avatar asked Oct 16 '15 09:10

taranaki


People also ask

How do you make a many-to-many relationship table?

When you need to establish a many-to-many relationship between two or more tables, the simplest way is to use a Junction Table. A Junction table in a database, also referred to as a Bridge table or Associative Table, bridges the tables together by referencing the primary keys of each data table.

How do you handle many-to-many relationships in data model?

When you have a many-to-many relationship between dimension-type tables, we provide the following guidance: Add each many-to-many related entity as a model table, ensuring it has a unique identifier (ID) column. Add a bridging table to store associated entities. Create one-to-many relationships between the three tables.

Why would we want to split data into separate tables?

In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update.


1 Answers

Another solution to achieve effect you described (in my opinion the simplest one; without any CTE-s or additional columns):

create table Pet as
    select
        Id,
        Pet,
        PetName
    from 
        Person;

create table PersonPet as
    select
        Id as FK_Person,
        Id as FK_Pet
    from
        Person;

create sequence PetSeq;
update PersonPet set FK_Pet=nextval('PetSeq'::regclass);
update Pet p set Id=FK_Pet from PersonPet pp where p.Id=pp.FK_Person;

alter table Pet alter column Id set default nextval('PetSeq'::regclass);
alter table Pet add constraint PK_Pet primary key (Id);
alter table PersonPet add constraint FK_Pet foreign key (FK_Pet) references Pet(Id);

We are simply using existing person id as a temporary id for pet unless we generate one using sequence.

Edit

It's also possible to use my approach having schema changes already done:

insert into Pet(Id, Pet, PetName)
    select
        Id,
        Pet,
        PetName
    from
        Person;

insert into PersonPet(FK_Person, FK_Pet)
    select
        Id,
        Id
    from
        Person;

select setval('PetSeq'::regclass, (select max(Id) from Person));
like image 191
Radek Postołowicz Avatar answered Sep 28 '22 00:09

Radek Postołowicz