Does having duplicate parameters in the IN clause affect performance of the query compared to eliminating duplicates before executing the query?
SELECT * FROM table WHERE column IN ('A', 'B', 'C', 'A', 'A')
vs
SELECT * FROM table WHERE column IN ('A', 'B', 'C')
I assemble the query programmatically through Java and am weighing whether I should use
I'm assuming the performance might not be significant, but would like to know the best practice moving forward.
Having duplicates will not decrease performance in a noticeable way, at least not by itself. However, it may have an indirect effect on the performance if the number of items changes between the queries, forcing a re-computation of a query plan on the server side.
Assuming that your query is parameterized, and there is a known limit to the number of IN list elements, it is better to have a fixed number of parameters in a prepared query, and bind NULLs to unused elements of the IN lists, with or without duplicates, than re-generating your query all the time.
If your query is not parameterized (be very careful with that) you would be better off not only eliminating the duplicates, but also ordering your unique items in the same way (say, by using a TreeSet). Otherwise, queries with IN lists of ('A', 'B', 'C') and ('A', 'C', 'B') would be considered different, triggering a re-computation of the query plan.
Another issue that you may run into if you keep duplicates is the maximum length of an IN list. Oracle sets the limit to about a thousand, so a list with duplicates may go past the limit even with the number of unique items well within the allowed maximum.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With