Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating pagination from multiple sources

I need to create an HTML table with pagination. The data comes from 2 different sources (could be 2 tables from 2 different databases like one Oracle and another is MySQL) which you can't use joined select statement. To make it more complicated, I need to display the data sorted by timestamp (one of the property is timestamp) in ascending order.

For example, source A has 45 records, source B has 55 records. So the table will display total records of 100, but only display let's say 15 records at a time. So there has to be 7 pages (6 pages with 15 records and 1 page with 10 records).

The above example are just a total of 100 records which might be easy for the memory to load them all. But in actual production, it could be thousands or millions records. Does anyone know any algorithm that I can use? The parameters that I can provide are page number and the number of record per page.

like image 955
Wins Avatar asked Aug 14 '12 15:08

Wins


1 Answers

As I understand, your concern is memory.

If individual tables (A and B) are not sorted by timestamp then you need to merge all their records into one file and then use some file-based sorting algorithm (something like MergeSort, in one pass you get sorted pairs of records, in the 2nd pass you get sorted 4s etc.). When you have a file with all the records in ascending order of timestamps you can break it into pages.

If the tables are already sorted than you need to merge N sorted sequences into one. I suggest you organize some kind of a Heap to keep track of which of N sources has the item with the smallest timestamp. In pseudocode it would look like this:

for i=1,N
{
  Add the 1st record from each table to the Heap
}
while(Heap not empty)
{
  x = take the smallest item from the heap, noting which table j this record belonged to
  Add x to output
  if (the j-th table is not completely processed)
  {
    take the next value from the j-th table and insert it into the heap
  }
}

The complexity is O(M*logN) where M is the total number of records in the tables and N is the number of tables. This whole Heap thing is only worth the hassle if N is sufficiently large (my guess is ~100). Otherwise I would go with linear search and O(N*M).

like image 88
Alexander Chertov Avatar answered Sep 17 '22 18:09

Alexander Chertov