I am wondering how best to migrate my data when splitting a Table into a many to many relationship. I've made a simplified example and I'll also post some of the solutions I have come up with. I am using a Postgresql Database.
Before Migration
Table Person
ID Name Pet PetName
1 Follett Cat Garfield
2 Rowling Hamster Furry
3 Martin Cat Tom
4 Cage Cat Tom
After Migration
Table Person
ID Name
1 Follett
2 Rowling
3 Martin
4 Cage
Table Pet
ID Pet PetName
6 Cat Garfield
7 Hamster Furry
8 Cat Tom
9 Cat Tom
Table PersonPet
FK_Person FK_Pet
1 6
2 7
3 8
4 9
Notes:
My Solutions
ALTER TABLE Pet ADD COLUMN IdPerson INTEGER;
INSERT INTO Pet (Pet, PetName, IdPerson)
SELECT Pet, PetName, ID
FROM Person;
INSERT INTO PersonPet (FK_Person, FK_Pet)
SELECT ID, IdPerson
FROM Pet;
ALTER TABLE Pet DROP Column IdPerson;
INSERT INTO Pet (Pet, PetName)
SELECT Pet, PetName
FROM Person;
WITH
CTE_Person
AS
(SELECT
Id, Pet, PetName
,ROW_NUMBER() OVER (PARTITION BY Pet, PetName ORDER BY Id) AS row_number
FROM Person
)
,CTE_Pet
AS
(SELECT
Id, Pet, PetName
,ROW_NUMBER() OVER (PARTITION BY Pet, PetName ORDER BY Id) AS row_number
FROM Pet
)
,CTE_Joined
AS
(SELECT
CTE_Person.Id AS Person_Id,
CTE_Pet.Id AS Pet_Id
FROM
CTE_Person
INNER JOIN CTE_Pet ON
CTE_Person.Pet = CTE_Pet.Pet
CTE_Person.PetName = CTE_Pet.PetName
AND CTE_Person.row_number = CTE_Pet.row_number
)
INSERT INTO PersonPet (FK_Person, FK_Pet)
SELECT Person_Id, Pet_Id from CTE_Joined;
Questions
When you need to establish a many-to-many relationship between two or more tables, the simplest way is to use a Junction Table. A Junction table in a database, also referred to as a Bridge table or Associative Table, bridges the tables together by referencing the primary keys of each data table.
When you have a many-to-many relationship between dimension-type tables, we provide the following guidance: Add each many-to-many related entity as a model table, ensuring it has a unique identifier (ID) column. Add a bridging table to store associated entities. Create one-to-many relationships between the three tables.
In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update.
Another solution to achieve effect you described (in my opinion the simplest one; without any CTE-s or additional columns):
create table Pet as
select
Id,
Pet,
PetName
from
Person;
create table PersonPet as
select
Id as FK_Person,
Id as FK_Pet
from
Person;
create sequence PetSeq;
update PersonPet set FK_Pet=nextval('PetSeq'::regclass);
update Pet p set Id=FK_Pet from PersonPet pp where p.Id=pp.FK_Person;
alter table Pet alter column Id set default nextval('PetSeq'::regclass);
alter table Pet add constraint PK_Pet primary key (Id);
alter table PersonPet add constraint FK_Pet foreign key (FK_Pet) references Pet(Id);
We are simply using existing person id as a temporary id for pet unless we generate one using sequence.
Edit
It's also possible to use my approach having schema changes already done:
insert into Pet(Id, Pet, PetName)
select
Id,
Pet,
PetName
from
Person;
insert into PersonPet(FK_Person, FK_Pet)
select
Id,
Id
from
Person;
select setval('PetSeq'::regclass, (select max(Id) from Person));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With