What is the difference between a hash join and a merge join (Oracle RDBMS )?

2 Answers

A "sort merge" join is performed by sorting the two data sets to be joined according to the join keys and then merging them together. The merge is very cheap, but the sort can be prohibitively expensive especially if the sort spills to disk. The cost of the sort can be lowered if one of the data sets can be accessed in sorted order via an index, although accessing a high proportion of blocks of a table via an index scan can also be very expensive in comparison to a full table scan.

A hash join is performed by hashing one data set into memory based on join columns and reading the other one and probing the hash table for matches. The hash join is very low cost when the hash table can be held entirely in memory, with the total cost amounting to very little more than the cost of reading the data sets. The cost rises if the hash table has to be spilled to disk in a one-pass sort, and rises considerably for a multipass sort.

(In pre-10g, outer joins from a large to a small table were problematic performance-wise, as the optimiser could not resolve the need to access the smaller table first for a hash join, but the larger table first for an outer join. Consequently hash joins were not available in this situation).

The cost of a hash join can be reduced by partitioning both tables on the join key(s). This allows the optimiser to infer that rows from a partition in one table will only find a match in a particular partition of the other table, and for tables having n partitions the hash join is executed as n independent hash joins. This has the following effects:

The size of each hash table is reduced, hence reducing the maximum amount of memory required and potentially removing the need for the operation to require temporary disk space.
For parallel query operations the amount of inter-process messaging is vastly reduced, reducing CPU usage and improving performance, as each hash join can be performed by one pair of PQ processes.
For non-parallel query operations the memory requirement is reduced by a factor of n, and the first rows are projected from the query earlier.

You should note that hash joins can only be used for equi-joins, but merge joins are more flexible.

In general, if you are joining large amounts of data in an equi-join then a hash join is going to be a better bet.

This topic is very well covered in the documentation.

http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/optimops.htm#i51523

12.1 docs: https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm

148

answered Sep 30 '22 17:09

David Aldridge

I just want to edit this for posterity that the tags for oracle weren't added when I answered this question. My response was more applicable to MS SQL.

Merge join is the best possible as it exploits the ordering, resulting in a single pass down the tables to do the join. IF you have two tables (or covering indexes) that have their ordering the same such as a primary key and an index of a table on that key then a merge join would result if you performed that action.

Hash join is the next best, as it's usually done when one table has a small number (relatively) of items, its effectively creating a temp table with hashes for each row which is then searched continuously to create the join.

Worst case is nested loop which is order (n * m) which means there is no ordering or size to exploit and the join is simply, for each row in table x, search table y for joins to do.

answered Sep 30 '22 16:09

Spence

Related questions
                            
                                Find duplicate entries in a column [duplicate]
                            
                                Get month name from date in Oracle
                            
                                T-SQL STOP or ABORT command in SQL Server
                            
                                Entity Framework error: Cannot insert explicit value for identity column in table
                            
                                How to group by month using SQL Server?
                            
                                add a temporary column with a value
                            
                                Select random row(s) in SQLite
                            
                                How to insert multiple records and get the identity value?
                            
                                SQL query for a carriage return in a string and ultimately removing carriage return
                            
                                SQLite Like % and _
                            
                                Query to convert from datetime to date mysql
                            
                                Best way to reset an Oracle sequence to the next value in an existing column?
                            
                                Laravel: getting a single value from a MySQL query
                            
                                Changing the CommandTimeout in SQL Management studio
                            
                                CONCAT'ing NULL fields
                            
                                The 'System.Web.Security.SqlMembershipProvider' requires a database schema compatible with schema version '1'
                            
                                Why doesn't Oracle tell you WHICH table or view does not exist?
                            
                                How to properly manage database deployment with SSDT and Visual Studio 2012 Database Projects?
                            
                                MySQL not using indexes with WHERE IN clause?
                            
                                Use SQL Server time datatype in C#.NET application?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between a hash join and a merge join (Oracle RDBMS )?

Tags:

performance

sql

join

oracle

Andrew Martinez

People also ask

2 Answers

David Aldridge

Spence

Recent Activity

Donate For Us