Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Offset/limit to page/size conversion

This should be an interresting challenge. I'm looking for an algorithm that doesn't exist yet (to the best of my knowledge)

  • We have a database accessing function that can read pages of records at a time, using a page-number and page-size as arguments. Lets call this function getFromDatabase(int page, int size).
  • We want to offer a REST API which should return records based on an offset and limit. Lets wrap this in a function getRecords(int offset, int limit).

Somehow we must use the given offset and limit to retrieve the matching database records which can only be access by page and size. Obviously, offset/limit won't always map to a single page/size. The challenge is finding an algorihm that makes the "ideal" number of getFromDatabase calls to retrieve all records. This algorithm should take several factors into account:

  • Each call to getFromDatabase has a certain overhead cost; keep calls to a minimum.
  • Each record retrieved adds additonal overhead; retrieve as few records as possible (it's okay to retrieve "waste" if it decreases total overhead).
  • The algorithm itself has overhead costs as well; obviously they should not outweight any benefits.

I've come up with the following algorithm: http://jsfiddle.net/mwvdlee/A7J9C/ (JS code, but the algorithm is language-agnostic). Essentially it is the following pseudocode:

do {
    do {
        try to convert (offset,limit) to (page,size)
        if too much waste
            lower limit by some amount
        else
            call `getDatabaseRecords()`
            filter out waste records
            increase offset to first record not yet retrieved
            lower limit to last records not yet retrieved              
    } until some records were retrieved
} until all records are retrieved from database

The key of this algorithm is in determining too much waste and some amount. But this algorithm isn't optimal, nor is it guarenteed to be complete (it may well be, I just can't proof it).

Are there any better (known?) algorithms, or improvements I could make? Does anybody have good ideas of how to tackle this problem?

like image 220
Martijn Avatar asked Jul 02 '14 14:07

Martijn


People also ask

What is offset and limit in pagination?

The limit option allows you to limit the number of rows returned from a query, while offset allows you to omit a specified number of rows before the beginning of the result set. Using both limit and offset skips both rows as well as limit the rows returned.

What is offset for pagination?

Offset-based pagination is a very famous technique wherein the client requests parameters with a specific limit (the number of results) and offset (the number of records that need to be skipped). Offset-based pagination is easy to use and is preferred for static data.

Why you shouldn't use offset and limit for your pagination?

Because the offset counts the rows manually for each page, it could under count because of the deleted row or over count because of a new row. Querying through offset will result in duplicate or missing results if your data is ever-changing.


1 Answers

As pointed out by @usr, in most variants of this problem (whether querying a database, API, or some other entity) it is preferable to reduce the number of calls as much as possible, since returning a few extra rows is almost always cheaper than issuing a separate call. The following PageSizeConversion algorithm will always find a single call that returns the fewest records possible (which is exactly how it performs the search). There may be some extra records returned at the beginning (headWaste) or end (tailWaste) of the data set, required to fit the data set within a single page. The algorithm is implemented in Javascript here, but it is easy to port to any language.

function PageSizeConversion(offset, limit) {
  var window, leftShift;
  for (window = limit; window <= offset + limit; window++) {
    for (leftShift = 0; leftShift <= window - limit; leftShift++) {
      if ((offset - leftShift) % window == 0) {
        this.pageSize = window;
        this.page = (offset - leftShift) / this.pageSize;

        this.headWaste = leftShift;
        this.tailWaste = ((this.page + 1) * this.pageSize) - (offset + limit);
        return;
      }
    }
  }
}

var testData = [
  {"offset": 0,"limit": 10,"expectedPage": 0,"expectedSize": 10,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 2,"limit": 1,"expectedPage": 2,"expectedSize": 1,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 2,"limit": 2,"expectedPage": 1,"expectedSize": 2,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 5,"limit": 3,"expectedPage": 1,"expectedSize": 4,"expectedHeadWaste": 1,"expectedTailWaste": 0},
  {"offset": 3,"limit": 5,"expectedPage": 0,"expectedSize": 8,"expectedHeadWaste": 3,"expectedTailWaste": 0},
  {"offset": 7,"limit": 3,"expectedPage": 1,"expectedSize": 5,"expectedHeadWaste": 2,"expectedTailWaste": 0},
  {"offset": 1030,"limit": 135,"expectedPage": 7,"expectedSize": 146,"expectedHeadWaste": 8,"expectedTailWaste": 3},
];

describe("PageSizeConversion Tests", function() {
  testData.forEach(function(testItem) {
    it("should return correct conversion for offset " + testItem.offset + " limit " + testItem.limit, function() {
      conversion = new PageSizeConversion(testItem.offset, testItem.limit);
      expect(conversion.page).toEqual(testItem.expectedPage);
      expect(conversion.pageSize).toEqual(testItem.expectedSize);
      expect(conversion.headWaste).toEqual(testItem.expectedHeadWaste);
      expect(conversion.tailWaste).toEqual(testItem.expectedTailWaste);
    });
  });
});

// load jasmine htmlReporter
(function() {
  var env = jasmine.getEnv();
  env.addReporter(new jasmine.HtmlReporter());
  env.execute();
}());
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.js"></script>
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine-html.js"></script>
<link href="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.css" rel="stylesheet" />
<title>Jasmine Spec Runner</title>

This is perhaps not exactly what @Martijn is looking for, since it occasionally produces a result with a lot of waste. But in most cases it seems like a good solution to the general problem.

like image 76
mkasberg Avatar answered Sep 21 '22 00:09

mkasberg