Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python generator for paged API resource

So I have this RESTful API with a collection people, which can be called like this:

http://example.com/people?lastname=smith

Which returns a JSON response like this:

    {
      "page": 0,
      "next": 1,
      "total": 5000000,
      "people": [
        { 
          "firstname": "John",
          "lastname": "Smith",
          "age": 32
        },
        { 
          "firstname": "Adam",
          "lastname": "Smith",
          "age": 84
        },
        ...
    }

I want to write a Python generator that will yield each person from the response and when it gets to the last person, if there is a next page, it will make the request for the next page with http://example.com/people?lastname=smith&page=1 and continue iterating over the results seamlessly. The resulting class call would be as simple as:

    client = PeopleClient("http://example.com/people")
    smiths = client.get_people_by_last_name("smith")

Where I would then be able to iterate over every "Smith" in smiths; through all 5 million, if necessary.

Any ideas on how to make this happen or if it is even possible?

Update

Using the answer from @ali-afshar as a guide, this implementation should work for the hypothetical REST API:

    import requests

    class PeopleClient:
        def __init__(self, url):
            self._url = url

        def _get_people(self, **kwargs):
            return requests.get(self._url, params=kwargs)

        def get_people_by_last_name(self, lastname):
            current_page = 0
            while current_page >= 0:
                result = self._get_people(lastname=lastname, page=current_page)
                for person in result.get("people", []):
                    yield person

                current_page = result.get("next", -1)
like image 815
jeremyswitzer Avatar asked Jul 17 '13 14:07

jeremyswitzer


People also ask

How do you Paginate in Python?

Suppose we have a list of strings called book, if we page an index (0-indexed) into the book, and page_size, we have to find the list of words on that page. If the page is out of index then simply return an empty list.

How does yield in Python work?

The Yield keyword in Python is similar to a return statement used for returning values or objects in Python. However, there is a slight difference. The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.


2 Answers

Short of writing your code for you, you want to take advantage of Python's generators, rather than realizing the whole set as a list. This way you can start using the results immediately and only perform paged requests when you get to the end of a page.

for person in PeopleClient("http://ex..").get_people_by_last_name("smith"):
    # Do something with the person

Secondly, your implementation of the actual request should take a page parameter which you can increment, and which can be called by the wrapper generator.

def get_people_page(name, page):
    # Perform the HTTP request, using page=page

The generator itself will be something like:

def get_all_people(name):
    page = 0
    has_more = 1
    while has_more:
        for person in get_people_page(name, page):
            yield person
        page += 1
        has_more = # calculate has more by whether you have a next link
                   # or whether the results set is empty
                   # or whether you get an error
like image 143
Ali Afshar Avatar answered Oct 13 '22 12:10

Ali Afshar


Here's my generator solution which I think is a touch cleaner, and when working with a specified per_page it saves you an extra needless request.

def get_all(per_page=100):
    page = 0
    while True:
        items = self.api.get(per_page=per_page, page=page)

        for item in items:
            yield item

        if len(items) < per_page:
            break

        page += 1

all_items = list(get_all())

The self.api.get() must accept a page and per_page param.

like image 44
jmoz Avatar answered Oct 13 '22 13:10

jmoz