This should be an interresting challenge. I'm looking for an algorithm that doesn't exist yet (to the best of my knowledge)
getFromDatabase(int page, int size)
.getRecords(int offset, int limit)
.Somehow we must use the given offset
and limit
to retrieve the matching database records which can only be access by page
and size
. Obviously, offset/limit won't always map to a single page/size. The challenge is finding an algorihm that makes the "ideal" number of getFromDatabase
calls to retrieve all records. This algorithm should take several factors into account:
getFromDatabase
has a certain overhead cost; keep calls to a minimum.I've come up with the following algorithm: http://jsfiddle.net/mwvdlee/A7J9C/ (JS code, but the algorithm is language-agnostic). Essentially it is the following pseudocode:
do {
do {
try to convert (offset,limit) to (page,size)
if too much waste
lower limit by some amount
else
call `getDatabaseRecords()`
filter out waste records
increase offset to first record not yet retrieved
lower limit to last records not yet retrieved
} until some records were retrieved
} until all records are retrieved from database
The key of this algorithm is in determining too much waste
and some amount
. But this algorithm isn't optimal, nor is it guarenteed to be complete (it may well be, I just can't proof it).
Are there any better (known?) algorithms, or improvements I could make? Does anybody have good ideas of how to tackle this problem?
The limit option allows you to limit the number of rows returned from a query, while offset allows you to omit a specified number of rows before the beginning of the result set. Using both limit and offset skips both rows as well as limit the rows returned.
Offset-based pagination is a very famous technique wherein the client requests parameters with a specific limit (the number of results) and offset (the number of records that need to be skipped). Offset-based pagination is easy to use and is preferred for static data.
Because the offset counts the rows manually for each page, it could under count because of the deleted row or over count because of a new row. Querying through offset will result in duplicate or missing results if your data is ever-changing.
As pointed out by @usr, in most variants of this problem (whether querying a database, API, or some other entity) it is preferable to reduce the number of calls as much as possible, since returning a few extra rows is almost always cheaper than issuing a separate call. The following PageSizeConversion algorithm will always find a single call that returns the fewest records possible (which is exactly how it performs the search). There may be some extra records returned at the beginning (headWaste
) or end (tailWaste
) of the data set, required to fit the data set within a single page. The algorithm is implemented in Javascript here, but it is easy to port to any language.
function PageSizeConversion(offset, limit) {
var window, leftShift;
for (window = limit; window <= offset + limit; window++) {
for (leftShift = 0; leftShift <= window - limit; leftShift++) {
if ((offset - leftShift) % window == 0) {
this.pageSize = window;
this.page = (offset - leftShift) / this.pageSize;
this.headWaste = leftShift;
this.tailWaste = ((this.page + 1) * this.pageSize) - (offset + limit);
return;
}
}
}
}
var testData = [
{"offset": 0,"limit": 10,"expectedPage": 0,"expectedSize": 10,"expectedHeadWaste": 0,"expectedTailWaste": 0},
{"offset": 2,"limit": 1,"expectedPage": 2,"expectedSize": 1,"expectedHeadWaste": 0,"expectedTailWaste": 0},
{"offset": 2,"limit": 2,"expectedPage": 1,"expectedSize": 2,"expectedHeadWaste": 0,"expectedTailWaste": 0},
{"offset": 5,"limit": 3,"expectedPage": 1,"expectedSize": 4,"expectedHeadWaste": 1,"expectedTailWaste": 0},
{"offset": 3,"limit": 5,"expectedPage": 0,"expectedSize": 8,"expectedHeadWaste": 3,"expectedTailWaste": 0},
{"offset": 7,"limit": 3,"expectedPage": 1,"expectedSize": 5,"expectedHeadWaste": 2,"expectedTailWaste": 0},
{"offset": 1030,"limit": 135,"expectedPage": 7,"expectedSize": 146,"expectedHeadWaste": 8,"expectedTailWaste": 3},
];
describe("PageSizeConversion Tests", function() {
testData.forEach(function(testItem) {
it("should return correct conversion for offset " + testItem.offset + " limit " + testItem.limit, function() {
conversion = new PageSizeConversion(testItem.offset, testItem.limit);
expect(conversion.page).toEqual(testItem.expectedPage);
expect(conversion.pageSize).toEqual(testItem.expectedSize);
expect(conversion.headWaste).toEqual(testItem.expectedHeadWaste);
expect(conversion.tailWaste).toEqual(testItem.expectedTailWaste);
});
});
});
// load jasmine htmlReporter
(function() {
var env = jasmine.getEnv();
env.addReporter(new jasmine.HtmlReporter());
env.execute();
}());
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.js"></script>
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine-html.js"></script>
<link href="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.css" rel="stylesheet" />
<title>Jasmine Spec Runner</title>
This is perhaps not exactly what @Martijn is looking for, since it occasionally produces a result with a lot of waste. But in most cases it seems like a good solution to the general problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With