Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested Join vs Merge Join vs Hash Join in PostgreSQL

I know how the

  1. Nested Join
  2. Merge Join
  3. Hash Join

works and its functionality.

I wanted to know in which situation these joins are used in Postgres

like image 486
vinieth Avatar asked Feb 28 '18 07:02

vinieth


People also ask

What is the difference between Merge Join and hash join?

Merge join is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins. Hash join is used when projections of the joined tables are not already sorted on the join columns.

Which join is faster in PostgreSQL?

Nested loop joins are particularly efficient if the outer relation is small, because then the inner loop won't be executed too often.

What is hash join in PostgreSQL?

hash join: the right relation is first scanned and loaded into a hash table, using its join attributes as hash keys. Next the left relation is scanned and the appropriate values of every row found are used as hash keys to locate the matching rows in the table.

What are the 3 Types of join algorithms?

The three algorithms are: Loop Join. Merge Join. Hash Join.


1 Answers

The following are a few rules of thumb:

  • Nested loop joins are preferred if one of the sides of the join has few rows. Nested loop joins are also used as the only option if the join condition does not use the equality operator.

  • Hash Joins are preferred if the join condition uses an equality operator and both sides of the join are large and the hash fits into work_mem.

  • Merge Joins are preferred if the join condition uses an equality operator and both sides of the join are large, but can be sorted on the join condition efficiently (for example, if there is an index on the expressions used in the join column).

A typical OLTP query that chooses only one row from one table and the associated rows from another table will always use a nested loop join as the only efficient method.

Queries that join tables with many rows (which cannot be filtered out before the join) would be very inefficient with a nested loop join and will always use a hash or merge join if the join condition allows it.

The optimizer considers each of these join strategies and uses the one that promises the lowest costs. The most important factor on which this decision is based is the estimated row count from both sides of the join. Consequently, wrong optimizer choices are usually caused by misestimates in the row counts.

like image 93
Laurenz Albe Avatar answered Oct 04 '22 06:10

Laurenz Albe