So I have this RESTful API with a collection people
, which can be called like this:
http://example.com/people?lastname=smith
Which returns a JSON response like this:
{
"page": 0,
"next": 1,
"total": 5000000,
"people": [
{
"firstname": "John",
"lastname": "Smith",
"age": 32
},
{
"firstname": "Adam",
"lastname": "Smith",
"age": 84
},
...
}
I want to write a Python generator that will yield each person from the response and when it gets to the last person, if there is a next page, it will make the request for the next page with http://example.com/people?lastname=smith&page=1
and continue iterating over the results seamlessly. The resulting class call would be as simple as:
client = PeopleClient("http://example.com/people")
smiths = client.get_people_by_last_name("smith")
Where I would then be able to iterate over every "Smith" in smiths
; through all 5 million, if necessary.
Any ideas on how to make this happen or if it is even possible?
Using the answer from @ali-afshar as a guide, this implementation should work for the hypothetical REST API:
import requests
class PeopleClient:
def __init__(self, url):
self._url = url
def _get_people(self, **kwargs):
return requests.get(self._url, params=kwargs)
def get_people_by_last_name(self, lastname):
current_page = 0
while current_page >= 0:
result = self._get_people(lastname=lastname, page=current_page)
for person in result.get("people", []):
yield person
current_page = result.get("next", -1)
Suppose we have a list of strings called book, if we page an index (0-indexed) into the book, and page_size, we have to find the list of words on that page. If the page is out of index then simply return an empty list.
The Yield keyword in Python is similar to a return statement used for returning values or objects in Python. However, there is a slight difference. The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.
Short of writing your code for you, you want to take advantage of Python's generators, rather than realizing the whole set as a list. This way you can start using the results immediately and only perform paged requests when you get to the end of a page.
for person in PeopleClient("http://ex..").get_people_by_last_name("smith"):
# Do something with the person
Secondly, your implementation of the actual request should take a page parameter which you can increment, and which can be called by the wrapper generator.
def get_people_page(name, page):
# Perform the HTTP request, using page=page
The generator itself will be something like:
def get_all_people(name):
page = 0
has_more = 1
while has_more:
for person in get_people_page(name, page):
yield person
page += 1
has_more = # calculate has more by whether you have a next link
# or whether the results set is empty
# or whether you get an error
Here's my generator solution which I think is a touch cleaner, and when working with a specified per_page
it saves you an extra needless request.
def get_all(per_page=100):
page = 0
while True:
items = self.api.get(per_page=per_page, page=page)
for item in items:
yield item
if len(items) < per_page:
break
page += 1
all_items = list(get_all())
The self.api.get()
must accept a page
and per_page
param.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With