I have looked for answers online but cant find a definitive answer. For example you have 2 join clauses:
1.
JOIN T2 ON T1.[ID] = T2.[ID]
2.
JOIN T2 ON T1.[ID] = REPLACE(T2.[ID],'A', '')
Now the 2nd one performs worse due to the function on the join clause. What is the exact reason for this?
And for example, if this code was in a stored procedure what would be best to optimise it? To remove the replace function and add it to the table level so all of this is completed before any joins?
Any advice or links to further information would be great. Thanks
In your second example, you are attempting to find a record in T2 - but instead of the value being the T1.ID value, you are applying a function to T2.ID - REPLACE(T2.[ID],'A', '')
If you had an index on T2.ID - at best it would scan the index and not seek it - thus causing a performance difference.
This is where it get's harder to explain - the index is stored as a b+tree, of the values for the T2.ID on the table. The index understands that field and can search / sort by it, but it doesn't understand any logic applied to it.
It does not know if REPLACE('A123','A', '') = 123 - without in executing the function on the value in the index and checking the resulting equality.
AAA123 would also be equal, 1A23, 12A3, 123A etc, there is a never ending amount of combinations that would in fact match - but the only way in which it can figure out if a single index entry matches is to run that value through the function and then check the equality.
If it can only figure that out when running the index value through the function - it can only properly answer the query correctly if it does that for every entry in the index - e.g. an index scan of every entry, being passed into the function and the output being checked.
As Jeroen mentions the term is SARGable or SARGability, Search ARGumentABLE, although I personally prefer to explain it as Seek ARGumentABLE since that is a closer match to the query plan operator.
It should be noted that this concept has nothing to do with it being a join, any predicate within SQL has this restriction - a single table query with a where predicate can have the same issue.
Can this problem be avoided? It can but only in some instances, where you can reverse the operation.
Consider a table with an ID column, I could construct a predicate such as this :
WHERE ID * 2 = @paramValue
The only way SQL Server would know if an ID entry multiplied by 2 is the passed in value is to process every entry, double it and check. So that is the index scan scenario again.
In this instance we can re-write it:
WHERE ID = @paramValue / 2.0
Now SQL Server will perform the mathematics once, divide the passed in value and it can then check that against the index in a seekable manner. The difference in the SQL written looks a potentially trivial difference of stating the problem, but makes a very large difference to how the database can resolve the predicate.
SQL Server has four basic methods for handling joins (as do other databases):
for loops and is usually the slowest method.In general, the first of these is the slowestthe second of these -- using an index -- is the fastest. (There are exceptions). The second is often the fastest.
When you use an equality comparison between two columns in the table, SQL Server has a lot of information for deciding on the best join algorithm to use:
Without this information, SQL Server often defaults to the nested-loop join. I find that it does this even when it could use the expression for a merge- or hash- based join.
As a note, you can work around this by using a computed column:
alter table t2 add id_no_a as (replace(id, 'A', '')) persisted;
create index idx_t2_id_no_a on t2(id_no_a);
Then phrase
on T1.[ID] = t2.id_no_a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With