Python generator for paged API resource

Tags:

generator

So I have this RESTful API with a collection people, which can be called like this:

http://example.com/people?lastname=smith

Which returns a JSON response like this:

    {
      "page": 0,
      "next": 1,
      "total": 5000000,
      "people": [
        { 
          "firstname": "John",
          "lastname": "Smith",
          "age": 32
        },
        { 
          "firstname": "Adam",
          "lastname": "Smith",
          "age": 84
        },
        ...
    }

I want to write a Python generator that will yield each person from the response and when it gets to the last person, if there is a next page, it will make the request for the next page with http://example.com/people?lastname=smith&page=1 and continue iterating over the results seamlessly. The resulting class call would be as simple as:

    client = PeopleClient("http://example.com/people")
    smiths = client.get_people_by_last_name("smith")

Where I would then be able to iterate over every "Smith" in smiths; through all 5 million, if necessary.

Any ideas on how to make this happen or if it is even possible?

Update

Using the answer from @ali-afshar as a guide, this implementation should work for the hypothetical REST API:

    import requests

    class PeopleClient:
        def __init__(self, url):
            self._url = url

        def _get_people(self, **kwargs):
            return requests.get(self._url, params=kwargs)

        def get_people_by_last_name(self, lastname):
            current_page = 0
            while current_page >= 0:
                result = self._get_people(lastname=lastname, page=current_page)
                for person in result.get("people", []):
                    yield person

                current_page = result.get("next", -1)

815

asked Jul 17 '13 14:07

jeremyswitzer

2 Answers

Short of writing your code for you, you want to take advantage of Python's generators, rather than realizing the whole set as a list. This way you can start using the results immediately and only perform paged requests when you get to the end of a page.

for person in PeopleClient("http://ex..").get_people_by_last_name("smith"):
    # Do something with the person

Secondly, your implementation of the actual request should take a page parameter which you can increment, and which can be called by the wrapper generator.

def get_people_page(name, page):
    # Perform the HTTP request, using page=page

The generator itself will be something like:

def get_all_people(name):
    page = 0
    has_more = 1
    while has_more:
        for person in get_people_page(name, page):
            yield person
        page += 1
        has_more = # calculate has more by whether you have a next link
                   # or whether the results set is empty
                   # or whether you get an error

143

answered Oct 13 '22 12:10

Ali Afshar

Here's my generator solution which I think is a touch cleaner, and when working with a specified per_page it saves you an extra needless request.

def get_all(per_page=100):
    page = 0
    while True:
        items = self.api.get(per_page=per_page, page=page)

        for item in items:
            yield item

        if len(items) < per_page:
            break

        page += 1

all_items = list(get_all())

The self.api.get() must accept a page and per_page param.

answered Oct 13 '22 13:10

jmoz

Related questions
                            
                                Pandas, loc vs non loc for boolean indexing
                            
                                Understanding shared_memory in Python 3.8
                            
                                Python Data Extraction from an Encrypted PDF
                            
                                TypeError: len is not well defined for symbolic Tensors. (activation_3/Identity:0) Please call `x.shape` rather than `len(x)` for shape information
                            
                                Selenium stuck on "Checking your browser before accessing URL"
                            
                                Which Version of Python to Use for Maximum Compatibility
                            
                                Using different versions of python for different projects in Eclipse
                            
                                Resize Tkinter Listbox widget when window resizes
                            
                                Does PyPy work with NLTK?
                            
                                USB devices UDev and D-BUS
                            
                                Task queue works from view, but UnknownQueueError when run from unit tests
                            
                                Python multi-threaded unittesting
                            
                                Looking for advice to secure a private REST API written in python-flask
                            
                                PHP: get array value as in Python?
                            
                                SQLAlchemy import tables with relationships
                            
                                How do I get the whole content between two xml tags in Python?
                            
                                Alternatives to Django for Python based Web Development? [closed]
                            
                                python subprocess.call() "no such file or directory"
                            
                                Sphinx's .. include:: directive and "duplicate label" warnings
                            
                                Why doesn't Python always require spaces around keywords?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With