Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python requests arguments/dealing with api pagination

I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.

The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.

Here is my code:

import requests r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json() r_sanfran.keys() # returns [u'per_page', u'last_page', u'total', u'jobs', u'page'] r_sanfran['last_page'] #returns 16 r_sanfran['page'] # returns 1 

I tried adding arguments to requests.get, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.

eg. r_sanfran['page'] = 2

I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.

Thanks as always.

Angel List API documentation if it's helpful.

like image 378
crock1255 Avatar asked Jul 21 '13 23:07

crock1255


People also ask

How do I Paginate JSON data in Python?

Paginated JSON will usually have an object with links to the previous and next JSON pages. To get the previous page, you must send a request to the "prev" URL. To get to the next page, you must send a request to the "next" URL. This will deliver a new JSON with new results and new links for the next and previous pages.


1 Answers

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

import requests  session = requests.Session()  def get_jobs():     url = "https://api.angel.co/1/tags/1664/jobs"      first_page = session.get(url).json()     yield first_page     num_pages = first_page['last_page']      for page in range(2, num_pages + 1):         next_page = session.get(url, params={'page': page}).json()         yield next_page  for page in get_jobs():     # TODO: process the page 
like image 183
dh762 Avatar answered Oct 04 '22 03:10

dh762