Insert data and set foreign keys with Postgres

Tags:

I have to migrate a large amount of existing data in a Postgres DB after a schema change.

In the old schema a country attribute would be stored in the users table. Now the country attribute has been moved into a separate address table:

users:
  country # OLD
  address_id # NEW [1:1 relation]

addresses:
  id
  country

The schema is actually more complex and the address contains more than just the country. Thus, every user needs to have his own address (1:1 relation).

When migrating the data, I'm having problems setting the foreign keys in the users table after inserting the addresses:

INSERT INTO addresses (country) 
    SELECT country FROM users WHERE address_id IS NULL 
    RETURNING id;

How do I propagate the IDs of the inserted rows and set the foreign key references in the users table?

The only solution I could come up with so far is creating a temporary user_id column in the addresses table and then updating the the address_id:

UPDATE users SET address_id = a.id FROM addresses AS a 
    WHERE users.id = a.user_id;

However, this turned out to be extremely slow (despite using indices on both users.id and addresses.user_id).

The users table contains about 3 million rows with 300k missing an associated address.

Is there any other way to insert derived data into one table and setting the foreign key reference to the inserted data in the other (without changing the schema itself)?

I'm using Postgres 8.3.14.

Thanks

I have now solved the problem by migrating the data with a Python/sqlalchemy script. It turned out to be much easier (for me) than trying the same with SQL. Still, I'd be interested if anybody knows a way to process the RETURNING result of an INSERT statement in Postgres SQL.

576

asked Sep 12 '11 16:09

Pankrat

1 Answers

The table users must have some primary key that you did not disclose. For the purpose of this answer I will name it users_id.

You can solve this rather elegantly with data-modifying CTEs introduced with PostgreSQL 9.1:

`country` is unique

The whole operation is rather trivial in this case:

WITH i AS (
    INSERT INTO addresses (country) 
    SELECT country
    FROM   users
    WHERE  address_id IS NULL 
    RETURNING id, country
    )
UPDATE users u
SET    address_id = i.id
FROM   i
WHERE  i.country = u.country;

You mention version 8.3 in your question. Upgrade! Postgres 8.3 has reached end of life.

Be that as it may, this is simple enough with version 8.3. You just need two statements:

INSERT INTO addresses (country) 
SELECT country
FROM   users
WHERE  address_id IS NULL;

UPDATE users u
SET    address_id = a.id
FROM   addresses a
WHERE  address_id IS NULL 
AND    a.country = u.country;

`country` is not unique

That's more challenging. You could just create one address and link to it multiple times. But you did mention a 1:1 relationship that rules out such a convenient solution.

WITH s AS (
    SELECT users_id, country
         , row_number() OVER (PARTITION BY country) AS rn
    FROM   users
    WHERE  address_id IS NULL 
    )
    , i AS (
    INSERT INTO addresses (country) 
    SELECT country
    FROM   s
    RETURNING id, country
    )
    , r AS (
    SELECT *
         , row_number() OVER (PARTITION BY country) AS rn
    FROM   i
    )
UPDATE users u
SET    address_id = r.id
FROM   r
JOIN   s USING (country, rn)    -- select exactly one id for every user
WHERE  u.users_id = s.users_id
AND    u.address_id IS NULL;

As there is no way to unambiguously assign exactly one id returned from the INSERT to every user in a set with identical country, I use the window function row_number() to make them unique.

Not as straight forward with Postgres 8.3. One possible way:

INSERT INTO addresses (country) 
SELECT DISTINCT country -- pick just one per set of dupes
FROM   users
WHERE  address_id IS NULL;

UPDATE users u
SET    address_id = a.id
FROM   addresses a
WHERE  a.country = u.country
AND    u.address_id IS NULL
AND NOT EXISTS (
    SELECT * FROM addresses b
    WHERE  b.country = a.country
    AND    b.users_id < a.users_id
    ); -- effectively picking the smallest users_id per set of dupes

Repeat this until the last NULL value is gone from users.address_id.

answered Oct 16 '22 00:10

Erwin Brandstetter

Related questions
                            
                                Get row index in datatable from a certain column
                            
                                Determining Nvarchar length
                            
                                Search a JSON array in Oracle
                            
                                postgresql json aggregate
                            
                                Increment counter or insert row in one statement, in SQLite
                            
                                Does HQL have an equivalent for Restrictions.ilike (for case-insensitive matching)?
                            
                                How does SQL Server determine the order of the columns when you do a `SELECT *`?
                            
                                Mysql query to join three tables
                            
                                How to set SMO ScriptingOptions to guarantee exact copy of table?
                            
                                Find out the default value for a column (Oracle)
                            
                                mongoDB vs mySQL -- why one is better than another in some aspects [closed]
                            
                                Add a new column to big database table
                            
                                Why is there no "first greater/less than [or equal to]" comparison operator in SQL?
                            
                                createNativeQuery mapping to POJO (non-entity)
                            
                                SQL Cheatsheet? [closed]
                            
                                How to design a database schema to support tagging with categories?
                            
                                VIEWS and Fluent NHibernate?
                            
                                How can I run SQL statements on a named range within an excel sheet?
                            
                                How tempDB works?
                            
                                Select proper columns from JOIN statement

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Insert data and set foreign keys with Postgres

Tags:

sql

postgresql

data-migration

Pankrat

People also ask

1 Answers

`country` is unique

`country` is not unique

Erwin Brandstetter

Recent Activity

Donate For Us

Insert data and set foreign keys with Postgres

Tags:

sql

postgresql

data-migration

Pankrat

People also ask

1 Answers

country is unique

country is not unique

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us

`country` is unique

`country` is not unique