Index to find records where the foreign key does not exist

Tags:

table products
id primary_key

table transactions
product_id foreign_key references products

The below SQL query is very slow:

SELECT products.* 
FROM   products 
       LEFT JOIN transactions 
              ON ( products.id = transactions.product_id ) 
WHERE  transactions.product_id IS NULL;

Out of 100 hundred million products records, there might be only 100 records where a product has no corresponding transactions.

This query is very slow as I suspect it is doing a full table scan to find those null foreign key product records.

I want to create a partial index like this:

CREATE INDEX products_with_no_transactions_index 
ON (Left JOIN TABLE 
    BETWEEN products AND transactions) 
WHERE transactions.product_id IS NULL;

Is the above possible and how would I go about it?

Note: Some characteristics of this data set:

Transactions are never deleted and only added.
Products are never deleted but added at a rate of 100s per minute (obviously this is a made up example behind a much more complex actual use case). A small perchange of those are temporarily orphaned
I need to frequently query (up to once per minute) and need to always know what the current set of orphaned products are

634

asked Jan 02 '14 00:01

samol

1 Answers

The best I can think of is your last idea in the comments: a materialized view.

CREATE MATERIALIZED VIEW orphaned_products AS
SELECT *
FROM   products p
WHERE  NOT EXISTS (SELECT 1 FROM transactions t WHERE t.product_id = p.id)

Then you can use this table (a materialized view is just a table) as drop-in replacement for the big table products in queries working with orphaned products - with obviously great impact on performance (a few 100 rows instead of 100 millions). Materialized views require Postgres 9.3, but that's what you are using according to the comments. And you can implement it by hand easily in earlier versions.

However, a materialized view is a snapshot and not updated dynamically. (This might void any performance benefit anyway.) To update, you run the (expensive) operation:

REFRESH MATERIALIZED VIEW orphaned_products;

You could do that at strategically opportune points in time and have multiple subsequent queries benefit from it, depending on your business model.

Of course, you would have an index on orphaned_products.id, but that would not be very important for a small table of a few hundred rows.

If your model is such that transactions are never deleted, you could exploit that to great effect. Create a similar table by hand:

CREATE TABLE orphaned_products2 AS
SELECT *
FROM   products p
WHERE  NOT EXISTS (SELECT 1 FROM transactions t WHERE t.product_id = p.id);

Of course you can refresh that "materialized view" just like the first one by truncating and refilling it. But the point is to avoid the expensive operation. All you actually need is:

Add new products to orphaned_products2.
Implement with a trigger AFTER INSERT ON products.
Remove products from orphaned_products2 as soon as a referencing row appears in table transactions.
Implement with a trigger AFTER UPDATE OF product_id ON transations. Only if your model allows transations.products_id to be updated - which would be an unconventional thing.
And another one AFTER INSERT ON transations.

All comparatively cheap operations.

If transactions can be deleted, too, you'd need another trigger to add orphaned products AFTER DELETE ON transations - which would a bit be more expensive. For every deleted transaction you need to check whether that was the last referencing the related product, and add an orphan in this case. May still be a lot cheaper than to refresh the whole materialized view.

`VACUUM`

After your additional information I would also suggest custom settings for aggressive vacuuming of orphaned_products2, since it is going to produce a lot of dead rows.

187

answered Sep 28 '22 00:09

Erwin Brandstetter

Related questions
                            
                                Optimize postgresql query
                            
                                Calculate statistics about duration between timestamped data
                            
                                Excel aggregating function
                            
                                ranking one column on another column
                            
                                Duplicating values because of SQL string?
                            
                                How and When LINQ Queries are Translated and Evaluated?
                            
                                sql update based on column names
                            
                                Storing SQL credentials correctly
                            
                                Ruby ActiveRecord and sql tuple support
                            
                                Parent child mysql
                            
                                Oracle: Find the position of an error in dynamic SQL using SQL or PL/SQL
                            
                                What is better create new table or add columns in existing table
                            
                                How do I filter the top 1% and lower 1% of data in each group in SQL
                            
                                Removing duplicate subtrees from CONNECT-BY query in oracle
                            
                                SQL GROUP_CONCAT split in different columns
                            
                                How to create JSON from an EAV table in SQL Server
                            
                                How to group by DATE only in column datetime
                            
                                Retrieve wordpress posts with featured image via SQL
                            
                                Combination of postgresql and neo4j for networking site
                            
                                Return rows with maximum date less than each value in a set of dates in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Index to find records where the foreign key does not exist

Tags:

sql

indexing

postgresql

materialized-views

postgresql-performance

samol

People also ask

1 Answers

`VACUUM`

Erwin Brandstetter

Recent Activity

Donate For Us