Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server, Using UNION ALL for multiple tables then paging implementation

I need help also about paging and using UNION ALL for multiple tables:

How do i implement an optimized paging when joining multiple tables using UNION ALL and returning only specific number of rows...


declare @startRow int
declare @PageCount int

set @startRow = 0
set @PageCount = 20

set rowcount @PageCount

select Row_Number() OVER(Order by col1) as RowNumber, col1, col2
from
(
    select col1, col2 from table1 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table2 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table3 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table4 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table5 where datetimeCol between (@dateFrom and @dateTo)
) as tmpTable
where RowNumber > @startRow

table 3, 4, & 5 have huge number of row (millions of rows) where table 1 & 2 may only have few thousand rows.

If startRow is "0", I only expect data from Row 1 to 20 (from Table1). I'm getting the correct result but has a high overhead on the remaining table while sql server tries to all all the data and filter it....

the longer the interval of the @dateFrom and @dateTo makes my query significantly slower while trying to retrieve only few rows from the overall result set

Please help how i can implement a simple but better approach with a similar logic. :(

like image 499
Robert Dominic Angelo Avatar asked Jun 06 '13 17:06

Robert Dominic Angelo


People also ask

Can you UNION all 3 tables?

Conclusion. Combining several tables to one large table is possible in all 3 ways. As we have seen, the behavior of UNION in SQL Server and UNION in DAX within Power BI is very similar.

How can improve UNION all performance in SQL Server?

Use UNION ALL instead of UNION whenever is possible That is why UNION ALL is faster. Because it does not remove duplicated values in the query. If there are few rows (let's say 1000 rows), there is almost no performance difference between UNION and UNION ALL. However, if there are more rows, you can see the difference.

Will use UNION to include duplicate rows which concatenating two or more tables?

The Union operator combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the Union. In simple terms, it combines the two or more row sets and keeps duplicates.


2 Answers

Consider using OFFSET FETCH clause (works starting with MSSQL 2012):

declare @startRow int
declare @PageCount int

set @startRow = 0
set @PageCount = 20


select col1, col2
from
(
    select col1, col2 from table1 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table2 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table3 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table4 where datetimeCol between (@dateFrom and @dateTo)
    union all
    select col1, col2 from table5 where datetimeCol between (@dateFrom and @dateTo)
) as tmpTable
order by col1
offset @startRow rows
fetch next @PageCount rows only

I also want to mention here, why this query always takes O(n*log(n)) time.
To execute this query, database needs to:

  1. to union multiple lists into one list - takes O(n) time for each table, where n - total number of rows in your tables;
  2. sort list by col1 - takes O(n*log(n)), where n - is total number of rows
  3. traverse the list in sorted order, skip @startRow rows, take next @PageCount rows.

As you can see, you need to union and sort all data to get the expected result (number 3).

If the performance of this query is still poor and you want to increase in, try to:

  • create clustred index based on col1 in all tables
  • create a non-clusteres index based on col1 in all tables and **inlude all other columns that you want to output in select list**.
like image 115
Alexander Bolgov Avatar answered Oct 14 '22 03:10

Alexander Bolgov


There maybe an issue with your database design since you have 5 similar tables. But besides this, you could materialize your UNION ALL query into a permanent table or a temp #-table with appropriate indexes on it and finally paginate over materialized data set with ROW_NUMBER() clause.

like image 28
YuGagarin Avatar answered Oct 14 '22 04:10

YuGagarin