Indexes on join tables

Tags:

When searching on Google for join table indexes, I got this question.

Now, I believe that it is giving some false information in the accepted answer, or I do not understand how everything works. Given the following tables (running on PostGreSQL 9.4):

CREATE TABLE "albums" ("album_id" serial PRIMARY KEY, "album_name" text)
CREATE TABLE "artists" ("artist_id" serial PRIMARY KEY, "artist_name" text)
CREATE TABLE "albums_artists" ("album_id" integer REFERENCES "albums", "artist_id" integer REFERENCES "artists")

I was trying to replicate the scenario from the question mentioned above, by creating first an index on both of the columns of the albums_artists table and then one index for each column (without keeping the index on both columns).

I would have been expecting very different results when using the EXPLAIN command for a normal, traditional select like the following one:

SELECT "artists".* FROM "test"."artists"
    INNER JOIN "test"."albums_artists" ON ("albums_artists"."artist_id" = "artists"."artist_id")
    WHERE ("albums_artists"."album_id" = 1)

However, when actually running explain on it, I get exactly the same result for each of the cases (with one index on each column vs. one index on both columns).

I've been reading the documentation on PostGreSQL about indexing and it doesn't make any sense on the results that I am getting:

Hash Join  (cost=15.05..42.07 rows=11 width=36) (actual time=0.024..0.025 rows=1 loops=1)
  Hash Cond: (artists.artist_id = albums_artists.artist_id)
  ->  Seq Scan on artists  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.006..0.006 rows=1 loops=1)
  ->  Hash  (cost=14.91..14.91 rows=11 width=4) (actual time=0.009..0.009 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 1kB
        ->  Bitmap Heap Scan on albums_artists  (cost=4.24..14.91 rows=11 width=4) (actual time=0.008..0.009 rows=1 loops=1)
              Recheck Cond: (album_id = 1)
              Heap Blocks: exact=1
              ->  Bitmap Index Scan on albums_artists_album_id_index  (cost=0.00..4.24 rows=11 width=0) (actual time=0.005..0.005 rows=1 loops=1)
                    Index Cond: (album_id = 1)

I would expect to not get an Index scan at the last step when using an index composed by 2 different columns (since I am only using one of them in the WHERE clause).

I was about to open a bug in an ORM library that adds one index for both columns for join tables, but now I am not so sure. Can anyone help me understand why is the behavior similar in the two cases and what would actually be the difference, if there is any?

396

asked Dec 05 '15 22:12

Florin Asăvoaie

1 Answers

add a NOT NULL constraint on the key columns (a tuple with NULLs would make no sense here)
add a PRIMARY KEY (forcing a UNIQUE index on the two keyfields)
As a suport for FK lookups : add a compound index for the PK fields in reversed order
after creating/adding PKs and indexes, you may want to ANALYZE the table (only key columns have statistics)

CREATE TABLE albums_artists
    ( album_id integer NOT NULL REFERENCES albums (album_id)
    , artist_id integer NOT NULL REFERENCES artists (artist_id)
    , PRIMARY KEY (album_id, artist_id)
    );

CREATE UNIQUE INDEX ON albums_artists (artist_id, album_id);

The reason behind the observed behaviour is the fact that the planner/optimiser is information based, driven by heuristics. Without any information about the fraction of rows that will actually be needed given the conditions, or the fraction of rows that actually maches (in case of a JOIN), the planner makes a guess: (for example: 10% for a range query). For a small query, a hash join will always be a winning scenario, it does imply fetching all tuples from both tables, but the join itself is very efficient.

For columns that are part of a key or index, statistics will be collected, so the planner can make more realistic estimates about the amount of rows involved. Ald that will often result in an indexed plan, since that could need fewer pages to be fetched.

Foreign keys are a very special case; since the planner will know that all the values from the referring table will be present in the referred table. (that is 100%, assuming NOT NULL)

answered Sep 28 '22 02:09

wildplasser

Related questions
                            
                                Best alternative to set a PostgreSQL schema using PHP PDO
                            
                                Recursive query challenge - simple parent/child example
                            
                                can't compare offset-naive and offset-aware datetimes - last_seen option [duplicate]
                            
                                How to execute postgres' sql queries from batch file?
                            
                                Bind array param to native query
                            
                                Postgresql 10 - Parallel configuration
                            
                                Why am I getting a an error when creating a generated column in PostgreSQL?
                            
                                Difference between ALTER ROLE WITH CREATEDB and GRANT CREATE ON TABLESPACE
                            
                                Auto-tuning tools for Postgres? [closed]
                            
                                Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
                            
                                Does Dapper work on Mono?
                            
                                Postgresql CASE statement - can I use the return value of the CASE within my SELECT?
                            
                                Using alias from JOIN statement in FROM
                            
                                get_or_create failure with Django and Postgres (duplicate key value violates unique constraint)
                            
                                Find the nearest points along the linestring in specified distance limit and order
                            
                                PostgreSQL multiple transactions on the same connection
                            
                                Slick issue when going with PostgreSQL
                            
                                Postgres Check if IP (inet) is IN a list of IP Ranges
                            
                                INSERT in single query into 2 tables postgresql
                            
                                psycopg2: How to execute vacuum postgresql query in python script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Indexes on join tables

Tags:

database-indexes

postgresql

database-design

Florin Asăvoaie

People also ask

1 Answers

wildplasser

Recent Activity

Donate For Us