I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.
SELECT col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = @parameter
--ORDER BY col1
is taking upwards of 5 or 6 mins to complete.
However
SELECT TOP 6000 col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = @parameter
--ORDER BY col1
completes in about 4 or 5 seconds.
This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.
So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)? Has anyone else witnessed something like this?
Thanks
For your question just use SELECT *. If you need all the columns there's no performance difference. Save this answer.
SELECT * returns more data than required to the client, which in turn will use more network bandwidth. This increase in network bandwidth also means that data will take a longer time to reach the client application, which could be SSMS or your Java application server.
Table valued functions can have a non-linear execution time.
Let's consider function equivalent for this query:
SELECT (
SELECT SUM(mi.value)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
mo.value
This query (that calculates the running SUM
) is fast at the beginning and slow at the end, since on each row from mo
it should sum all the preceding values which requires rewinding the rowsource.
Time taken to calculate SUM
for each row increases as the row numbers increase.
If you make mytable
large enough (say, 100,000
rows, as in your example) and run this query you will see that it takes considerable time.
However, if you apply TOP 5000
to this query you will see that it completes much faster than 1/20
of the time required for the full table.
Most probably, something similar happens in your case too.
To say something more definitely, I need to see the function definition.
Update:
SQL Server
can push predicates into the function.
For instance, I just created this TVF
:
CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN (
SELECT *
FROM master
);
These queries:
SELECT *
FROM fn_test()
WHERE name = @name
SELECT TOP 1000 *
FROM fn_test()
WHERE name = @name
yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP
)
I had the same problem, a simple query joining five tables returning 1000 rows took two minutes to complete. When I added "TOP 10000" to it it completed in less than one second. It turned out that the clustered index on one of the tables was heavily fragmented.
After rebuilding the index the query now completes in less than a second.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With