Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pagination with AWS DynamoDB with PHP

Have someone any idea about Paginating the records from a table. Actually I want to create a paginate component in php with DynamoDb.

It seems like it is not possible to giving pagination like <first> <prev> 1,2,3,4,5... <next> <last>.

Because Dyanmodb just provide us LIMIT clause by which we can read certain no. of records and we can process next n records by LastEvaluatedKey. So if I want to jump directly to 5th page, How is it possible ?

As per my understanding we can't display page numbers into the pagination. The thing we can do is just read certain limit of records and provide the NEXT link to retrieve next n records.

Pagination is basic feature of any web application, How can we implement if migrating to cloud database like DynamoDb ?

Please provide your views and suggestions. Thanks

like image 503
Kam Paul Avatar asked May 03 '14 11:05

Kam Paul


Video Answer


2 Answers

Yes, you are right, there is no OFFSET in DynamoDB. But using only Limit and LastEvaluatedKey, i made this function:

public function scan($table, $filter = [], $select = null, $limit = 2)
{
    $page = isset($_GET['page']) ? $_GET['page'] : 0;
    $options = [
        'TableName' => $table,
        'Count' => true,
    ];

    if (!empty($limit)) {
        $options['Limit'] = $limit;
    }

    if (!is_null($select)) {
        $options['Select'] = $select;
    }

    if (!empty($filter)) {
        $options['ScanFilter'] = $filter;
    }

    $results = $results = $this->_client->scan($options);

    while ($page > 0 && isset($results['LastEvaluatedKey'])) {
        $results = $results = $this->_client->scan($options);
        $options['ExclusiveStartKey'] = $results['LastEvaluatedKey'];
        $page--;
    }

    return $results;
}

$this->_client refers to DynamoDb client object.
Basically i loop through all entries with LastEvaluatedKey till i reach needed page.
To get total entries in table, call $this->scan($this->tableName(), [], null, null)['Count']; (that is - without any search criteria and without pagination, just as in normal pagination function).

like image 110
Justinas Avatar answered Sep 28 '22 13:09

Justinas


To add to @Justinas' answer, Dynamo will have pretty horrible performance for pagination if random access (ie. jump to an arbitrary page) is desired. However, if you only do next page and previous page, you could pass around the LastEvaluatedKey and keep the overhead due to scanning to a minimum.

As stated in the comments, you should definitely cache the results as much as possible. At a minimum, the LastEvaluatedKey results could be cached so that they don't need to be recomputed for each paging request when users page through the results. Here's an example of what I mean:

Say you have a table that has a schema like this, where CommentID is the hash-key.

 CommentID | Author | Comment | ...
-----------+--------+---------+------------
    1      | Joe    | Foo     | ...
    2      | Joe    | Bar     | ...
    3      | John   | Baz     | ...
    4      | Joe    | FooBar  | ...
    5      | Jane   | BooBaz  | ...
    6      | Joesie | Blah    | ...
    7      | Johnny | Blahaha | ...

When you start paging, say you request 3 comments per page, you will get the first page results and the LastEvaluatedKey = 3; Then if you make a second scan request, for page 2, using ExclusiveStartKey=3 you will get LastEvaluatedKey = 6; To get page 3 you would make another scan using LastEvaluatedKey = 6.. and so on.

You can see that without any kind of caching, to get Page 3 you are performing three scans (two of which would have been repeated if you also requested pages 1 and two prior to page 3). So, my proposed optimization is to store the corresponding keys for each page. You would end up with a map like this:

 Page | Hash-Key
------+----------
   1  |   null
   2  |     3
   3  |     6
  ..  |    ... 

And the values would get filled out as you page through results. Now, when a user wants page 3, all you have to do is one scan, using 6 as the ExclusiveStartKey.

Of course you would need a look-up table like this for each page size, and the table would only be accurate until new rows get added (or removed). That said, if you have lots of requests the extra memory needed to store the paging cache would be well worth it. All is left then is to set a reasonable expiration for your paging cache, depending on how often new data is added (or removed) in your table..

like image 38
Mike Dinescu Avatar answered Sep 28 '22 13:09

Mike Dinescu