Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table join order in postgres

Is there a way for me to force a specific join order in Postgres?

I've got a query that looks like this. I've eliminated a bunch of stuff that was in the real query, but this simplification demonstrates the issue. What's left shouldn't be too cryptic: Using a role/task security system, I'm trying to determine whether a given user has privileges to perform a given task.

select task.taskid
from userlogin
join userrole using (userloginid)
join roletask using (roleid)
join task using (taskid)
where loginname='foobar'
and taskfunction='plugh'

But I realized that the program already knows the value of userlogin, so it seemed the query could be made more efficient by skipping the lookup on userlogin and just filling in the userloginid, like this:

select task.taskid
from userrole
join roletask using (roleid)
join task using (taskid)
where userloginid=42
and taskfunction='plugh'

When I did that -- eliminating a table from the query and hard-coding the value retrieved from that table instead -- the explain plan time went up! In the original query, Postgres read userlogin then userrole then roletask then task. But in the new query, it decided to read roletask first, and then join to userrole, even though this required doing a full-file scan on roletask.

Full explain plans are:

Version 1:

Hash Join  (cost=12.79..140.82 rows=1 width=8) 
  Hash Cond: (roletask.taskid = task.taskid) 
  ->  Nested Loop  (cost=4.51..129.73 rows=748 width=8) 
        ->  Nested Loop  (cost=4.51..101.09 rows=12 width=8) 
              ->  Index Scan using idx_userlogin_loginname on userlogin  (cost=0.00..8.27 rows=1 width=8) 
                    Index Cond: ((loginname)::text = 'foobar'::text) 
              ->  Bitmap Heap Scan on userrole  (cost=4.51..92.41 rows=33 width=16) 
                    Recheck Cond: (userrole.userloginid = userlogin.userloginid) 
                    ->  Bitmap Index Scan on idx_userrole_login  (cost=0.00..4.50 rows=33 width=0) 
                          Index Cond: (userrole.userloginid = userlogin.userloginid) 
        ->  Index Scan using idx_roletask_role on roletask  (cost=0.00..1.50 rows=71 width=16) 
              Index Cond: (roletask.roleid = userrole.roleid) 
  ->  Hash  (cost=8.27..8.27 rows=1 width=8) 
        ->  Index Scan using idx_task_taskfunction on task  (cost=0.00..8.27 rows=1 width=8) 
              Index Cond: ((taskfunction)::text = 'plugh'::text) 

Version 2:

Hash Join  (cost=96.58..192.82 rows=4 width=8) 
  Hash Cond: (roletask.roleid = userrole.roleid) 
  ->  Hash Join  (cost=8.28..104.10 rows=9 width=16) 
        Hash Cond: (roletask.taskid = task.taskid) 
        ->  Seq Scan on roletask  (cost=0.00..78.35 rows=4635 width=16) 
        ->  Hash  (cost=8.27..8.27 rows=1 width=8) 
              ->  Index Scan using idx_task_taskfunction on task  (cost=0.00..8.27 rows=1 width=8) 
                    Index Cond: ((taskfunction)::text = 'plugh'::text) 
  ->  Hash  (cost=87.92..87.92 rows=31 width=8) 
        ->  Bitmap Heap Scan on userrole  (cost=4.49..87.92 rows=31 width=8) 
              Recheck Cond: (userloginid = 42) 
              ->  Bitmap Index Scan on idx_userrole_login  (cost=0.00..4.49 rows=31 width=0) 
                    Index Cond: (userloginid = 42) 

(Yes, I know that in both cases the costs are low and the difference doesn't look like it would matter. But this is after I eliminated a bunch of additional work from the query to simplify what I have to post. The real query still isn't outrageous, but I'm more interested in the principle.)

like image 322
Jay Avatar asked Sep 23 '09 20:09

Jay


2 Answers

This page in the documentation describes how to prevent the PostgreSQL optimizer from reordering joined tables, allowing you to control the order of joins yourself:

http://www.postgresql.org/docs/current/interactive/explicit-joins.html

like image 107
Bill Karwin Avatar answered Sep 23 '22 02:09

Bill Karwin


Are you sure your table statistics are up to date? When PostgreSQLs cost based optimizer fails with such trivial things it's a pretty good sign something is seriously wrong with the table statistics. It's better to fix the root cause than to work around it by overriding the built in optimizer because the problem will inevitably pop up somewhere else as well.

Run ANALYZE on the affected tables and see if it makes PostgreSQL pick a different plan. If it still chooses something silly it would be really interesting to see the query plans. The optimizer not doing the right thing is usually considered a bug.

like image 32
Ants Aasma Avatar answered Sep 23 '22 02:09

Ants Aasma