This is a performance question simplified to join of two indexes. Take the following setup:
CREATE TABLE ZZ_BASE AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_B_I ON ZZ_BASE(ID ASC);
CREATE TABLE ZZ_CHILD AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_C_I ON ZZ_CHILD(ID ASC);
-- As @Flado pointed out, the following is required so index scanning can be done
ALTER TABLE ZZ_BASE MODIFY (ID CONSTRAINT NN_B NOT NULL);
ALTER TABLE ZZ_CHILD MODIFY (ID CONSTRAINT NN_C NOT NULL); -- given the join below not mandatory.
Now I want to LEFT OUTER JOIN these two tables and only output the already indexed ID field.
SELECT ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 4894 (2)| 00:00:30 |
|* 1 | HASH JOIN OUTER | | 1000K| 9765K| 16M| 4894 (2)| 00:00:30 |
| 2 | INDEX FAST FULL SCAN| ZZ_B_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
| 3 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
----------------------------------------------------------------------------------------
As you can see no table access is necessary, only index access. But according to common sense, HASH-joining is not the most optimal way to join these two indexes. If these two tables where much larger, a very large hash table would have to be made.
A much more efficient way would be to SORT-MERGE the two indexes.
SELECT /*+ USE_MERGE(ZZ_BASE ZZ_CHILD) */ ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 4673 (4)| 00:00:29 |
| 4 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
-----------------------------------------------------------------------------------------
But it appears that the second index gets sorted, even if it already is ("If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes"1)
Basically, what I want is a query that uses a SORT-MERGE join and instantly starts outputting the records, i.e.:
Nested Loops are used to join smaller tables. Further, nested loop join uses during the cross join and table variables. Merge Joins are used to join sorted tables. This means that Merge joins are utilized when join columns are indexed in both tables while Hash Match join uses a hash table to join equi joins.
The sort-merge join combines two sorted lists like a zipper. Both sides of the join must be sorted by the join predicates. A sort-merge join needs the same indexes as the hash join, that is an index for the independent conditions to read all candidate records in one shot.
In a SORT-MERGE join, Oracle sorts the first row source by its join columns, sorts the second row source by its join columns, and then merges the sorted row sources together. As matches are found, they are put into the result set.
The IsSorted property of the output that indicates whether the data has been sorted. This property must be set to True. Setting the value of the IsSorted property to True does not sort the data. This property only provides a hint to downstream components that the data has been previously sorted.
INDEX_ASC (or just INDEX) is the hint you might want to try in order to compare performance with real data.
I am a little surprised that you get any kind of index scan for the outer row source, since B*Tree indexes cannot find NULL keys and ZZ_BASE does not have a NOT NULL
constraint. Adding that and hinting a bit more will get you the full scan in index order of the ZZ_C_I index. That does not save you the SORT JOIN
step, unfortunately, but at least it should be much faster - O(n) - since the data is already sorted.
alter table zz_base modify (id not null);
SELECT
/*+ leading(zz_base) USE_MERGE(ZZ_CHILD)
index_asc(zz_base (id)) index(zz_child (id)) */ ZZ_BASE.ID
FROM ZZ_BASE left outer join ZZ_CHILD on zz_base.id=zz_child.id;
This query uses the following execution plan:
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 5983 (3)| 00:00:36 |
| 4 | INDEX FULL SCAN| ZZ_C_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
------------------------------------------------------------------------------------
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With