PostgreSQL - Slow query joining on a VIEW

Tags:

I'm trying to do a simple join between a table (players) and view (player_main_colors):

SELECT P.*, C.main_color FROM players P
    OUTER LEFT JOIN player_main_colors C USING (player_id)
    WHERE P.user_id=1;

This query is taking ~40 ms.

Here I'm using a nested SELECT on the VIEW instead of the JOIN:

SELECT player_id, main_color FROM player_main_colors
    WHERE player_id IN (
        SELECT player_id FROM players WHERE user_id=1);

This query is also taking ~40 ms.

When I split the query into its 2 pieces, it becomes fast as I would have expected:

SELECT player_id FROM players WHERE user_id=1;

SELECT player_id, main_color FROM player_main_colors
    where player_id in (584, 9337, 11669, 12096, 13651,
        13852, 9575, 23388, 14339, 500, 24963, 25630,
        8974, 13048, 11904, 10537, 20362, 9216, 4747, 25045);

These queries take ~0.5 ms each.

So why are the above queries with the JOIN or sub-SELECT so horribly slow and how can I fix it?

Here are some details about my tables and the view:

CREATE TABLE users (
    user_id INTEGER PRIMARY KEY,
    ...
)

CREATE TABLE players (
    player_id INTEGER PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users (user_id),
    ...
)

CREATE TABLE player_data (
    player_id INTEGER NOT NULL REFERENCES players (player_id),
    game_id INTEGER NOT NULL,
    color INTEGER NOT NULL,
    PRIMARY KEY (player_id, game_id, color),
    active_time INTEGER DEFAULT 0,
    ...
)

CREATE VIEW player_main_colors AS
    SELECT DISTINCT ON (1) player_id, color as main_color
        FROM player_data
        GROUP BY player_id, color
        ORDER BY 1, MAX(active_time) DESC

It seems like it must be a problem with my VIEW...?

Here's an EXPLAIN ANALYZE for the nested SELECT query above:

Merge Semi Join  (cost=1877.59..2118.00 rows=6851 width=8) (actual time=32.946..38.471 rows=25 loops=1)
   Merge Cond: (player_data.player_id = players.player_id)
   ->  Unique  (cost=1733.19..1801.70 rows=13701 width=12) (actual time=32.651..37.209 rows=13419 loops=1)
         ->  Sort  (cost=1733.19..1767.45 rows=13701 width=12) (actual time=32.646..34.918 rows=16989 loops=1)
               Sort Key: player_data.player_id, (max(player_data.active_time))
               Sort Method: external merge  Disk: 376kB
               ->  HashAggregate  (cost=654.79..791.80 rows=13701 width=12) (actual time=13.636..19.051 rows=17311 loops=1)
                     ->  Seq Scan on player_data  (cost=0.00..513.45 rows=18845 width=12) (actual time=0.005..1.801 rows=18845 loops=1)
   ->  Sort  (cost=144.40..144.53 rows=54 width=8) (actual time=0.226..0.230 rows=54 loops=1)
         Sort Key: players.player_id
         Sort Method: quicksort  Memory: 19kB
         ->  Bitmap Heap Scan on players  (cost=4.67..142.85 rows=54 width=8) (actual time=0.035..0.112 rows=54 loops=1)
               Recheck Cond: (user_id = 1)
               ->  Bitmap Index Scan on test  (cost=0.00..4.66 rows=54 width=0) (actual time=0.023..0.023 rows=54 loops=1)
                     Index Cond: (user_id = 1)
 Total runtime: 39.279 ms

As for indexes, I only have 1 relevant one on top of the default ones for my primary keys:

CREATE INDEX player_user_idx ON players (user_id);

I'm currently using PostgreSQL 9.2.9.

Update:

I've reduced the problem below. See the difference between IN (4747) and IN (SELECT 4747).

Slow:

>> explain analyze SELECT * FROM (
          SELECT player_id, color 
            FROM player_data
            GROUP BY player_id, color
            ORDER BY MAX(active_time) DESC
       ) S
       WHERE player_id IN (SELECT 4747);

 Hash Join  (cost=1749.99..1975.37 rows=6914 width=8) (actual time=30.492..34.291 rows=4 loops=1)
   Hash Cond: (player_data.player_id = (4747))
   ->  Sort  (cost=1749.95..1784.51 rows=13827 width=12) (actual time=30.391..32.655 rows=17464 loops=1)
         Sort Key: (max(player_data.active_time))
         Sort Method: external merge  Disk: 376kB
         ->  HashAggregate  (cost=660.71..798.98 rows=13827 width=12) (actual time=12.714..17.249 rows=17464 loops=1)
               ->  Seq Scan on player_data  (cost=0.00..518.12 rows=19012 width=12) (actual time=0.006..1.898 rows=19012 loops=1)
   ->  Hash  (cost=0.03..0.03 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Total runtime: 35.015 ms
(12 rows)

Time: 35.617 ms

Fast:

>> explain analyze SELECT * FROM (
          SELECT player_id, color 
            FROM player_data
            GROUP BY player_id, color
            ORDER BY MAX(active_time) DESC
       ) S
       WHERE player_id IN (4747);

 Subquery Scan on s  (cost=17.40..17.45 rows=4 width=8) (actual time=0.035..0.035 rows=4 loops=1)
   ->  Sort  (cost=17.40..17.41 rows=4 width=12) (actual time=0.034..0.034 rows=4 loops=1)
         Sort Key: (max(player_data.active_time))
         Sort Method: quicksort  Memory: 17kB
         ->  GroupAggregate  (cost=0.00..17.36 rows=4 width=12) (actual time=0.020..0.027 rows=4 loops=1)
               ->  Index Scan using player_data_pkey on player_data  (cost=0.00..17.28 rows=5 width=12) (actual time=0.014..0.019 rows=5 loops=1)
                     Index Cond: (player_id = 4747)
 Total runtime: 0.080 ms
(8 rows)

Time: 0.610 ms

401

asked Jan 24 '15 22:01

user202987

1 Answers

You have both GROUP BY and DISTINCT ON in your VIEW definition. That's like shooting a dead man. Simplify:

CREATE VIEW player_main_colors AS
SELECT DISTINCT ON (1)
       player_id, color AS main_color
FROM   player_data
ORDER  BY 1, active_time DESC NULLS LAST;

NULLS LAST is necessary to be equivalent to your original because active_time can be NULL according to your table definition. Should be faster. But there is more. For best performance create these indexes:

CREATE INDEX players_up_idx ON players (user_id, player_id);
CREATE INDEX players_data_pa_idx ON player_data
    (player_id, active_time DESC NULLS LAST, color);

Use DESC NULLS LAST in the index as well to match the sort order of the query. Or you alter player_data.active_time to NOT NULL and simplify all.

BTW, it's LEFT OUTER JOIN not ~~OUTER LEFT JOIN~~, or just omit the noise word OUTER:

SELECT *  -- equivalent here to "p.*, c.main_color"
FROM   players p
LEFT   JOIN player_main_colors c USING (player_id)
WHERE  p.user_id = 1;

I'll assume there are lots of rows in player_data for each player_id. And you are only selecting a few player_id. JOIN LATERAL would be fastest for this case, but you need Postgres 9.3 or later for that. In pg 9.2 you can achieve a similar effect with correlated subqueries:

CREATE VIEW player_main_colors AS
SELECT player_id
    , (SELECT color 
       FROM   player_data
       WHERE  player_id = p.player_id
       ORDER  BY active_time DESC NULLS LAST
       LIMIT  1) AS main_color
FROM   players p
ORDER  BY 1  -- optional

Subtle difference to your original view: this includes players without any entries in player_data. You could try the same query as above based on the new view. But I would not use a view at all. This is probably fastest:

SELECT *
    , (SELECT color 
       FROM   player_data
       WHERE  player_id = p.player_id
       ORDER  BY active_time DESC NULLS LAST
       LIMIT  1) AS main_color
FROM   players p
WHERE  p.user_id = 1;

Detailed explanation:

Optimize GROUP BY query to retrieve latest record per user

answered Sep 23 '22 10:09

Erwin Brandstetter

Related questions
                            
                                Using Android's <merge> XML element on rootview for activity's layout
                            
                                Resize the images according screen resolution Android
                            
                                Find pairwise overlaps of intervals (segments)
                            
                                Best way to inject a html row template
                            
                                Leaflet.js and JSON data : optimization and performance
                            
                                MATLAB Speed Optimisation
                            
                                (int) Math.sqrt(n) much slower than (int) Math.floor(Math.sqrt(n))
                            
                                Dummy where clauses effects on performance
                            
                                Is there a way to write this without a GoTo statement?
                            
                                Speed up nested for loop with elements exponentiation
                            
                                Variable and degrading performance when using jbcrypt
                            
                                Are there performance penalties for named/optional parameters for constructors? [duplicate]
                            
                                Array collisions in php
                            
                                R - Basic understanding of using 'apply' instead of nested loop
                            
                                HTML5 canvas game performance vs. alternatives [closed]
                            
                                Java Stream-API performance with infinite series
                            
                                What memory access patterns are most efficient for outer-product-type double loops?
                            
                                SQL VARCHAR vs NVARCHAR in CAST performance
                            
                                C# linq order by and other statements with foreach, is there a performance difference?
                            
                                Fast SSE threshold algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL - Slow query joining on a VIEW

Tags:

performance

postgresql

greatest-n-per-group

query-optimization

postgresql-performance

user202987

People also ask

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us