<p>I need to retrieve the 500 most popular films from a REST API, but the results are limited to 20 per page and I am only able to make 40 calls every 10 seconds (https://developers.themoviedb.org/3/getting-started/request-rate-limiting). I am unable to loop through the paginated results dynamically, so that the 500 most popular results are in a single list.</p> <p>I can successfully return the top 20 most popular films (see below) and enumerate the number of the film, but I am getting stuck working through the loop that allows me to paginate through the top 500 without timing out due to the API rate limit. </p> <pre class="prettyprint"><code>import requests #to make TMDB API calls #Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18 discover_api = 'https://api.themoviedb.org/3/discover/movie? api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18' #Returning all drama films >= 2004 in popularity desc discover_api = requests.get(discover_api).json() most_popular_films = discover_api['results'] #printing movie_id and movie_title by popularity desc for i, film in enumerate(most_popular_films): print(i, film['id'], film['title']) </code></pre> <pre class="prettyprint"><code> Sample response: { "page": 1, "total_results": 101685, "total_pages": 5085, "results": [ { "vote_count": 13, "id": 280960, "video": false, "vote_average": 5.2, "title": "Catarina and the others", "popularity": 130.491, "poster_path": "/kZMCbp0o46Tsg43omSHNHJKNTx9.jpg", "original_language": "pt", "original_title": "Catarina e os Outros", "genre_ids": [ 18, 9648 ], "backdrop_path": "/9nDiMhvL3FtaWMsvvvzQIuq276X.jpg", "adult": false, "overview": "Outside, the first sun rays break the dawn. Sixteen years old Catarina can't fall asleep. Inconsequently, in the big city adults are moved by desire... Catarina found she is HIV positive. She wants to drag everyone else along.", "release_date": "2011-03-01" }, { "vote_count": 9, "id": 531309, "video": false, "vote_average": 4.6, "title": "Brightburn", "popularity": 127.582, "poster_path": "/roslEbKdY0WSgYaB5KXvPKY0bXS.jpg", "original_language": "en", "original_title": "Brightburn", "genre_ids": [ 27, 878, 18, 53 ], </code></pre> <p>I need the the python loop to append the paginated results into a single list until I have captured the 500 most popular films. </p> <pre class="prettyprint"><code> Desired Output: Movie_ID Movie_Title 280960 Catarina and the others 531309 Brightburn 438650 Cold Pursuit 537915 After 50465 Glass 457799 Extremely Wicked, Shockingly Evil and Vile </code></pre>

<p>Most APIs include a <code>next_url</code> field to help you loop through all results. Let's examine some cases.</p> <h3>1. No <code>next_url</code> field</h3> <p>You can just loop through all pages until <code>results</code> field is empty:</p> <pre class="prettyprint lang-py prettyprint-override"><code>import requests #to make TMDB API calls #Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18 discover_api_url = 'https://api.themoviedb.org/3/discover/movie? api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18' most_popular_films = [] new_results = True page = 1 while new_results: discover_api = requests.get(discover_api_url + f"&page={page}").json() new_results = discover_api.get("results", []) most_popular_films.extend(new_results) page += 1 #printing movie_id and movie_title by popularity desc for i, film in enumerate(most_popular_films): print(i, film['id'], film['title']) </code></pre> <h3>2. Depend on <code>total_pages</code> field</h3> <pre class="prettyprint lang-py prettyprint-override"><code>import requests #to make TMDB API calls #Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18 discover_api_url = 'https://api.themoviedb.org/3/discover/movie? api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18' discover_api = requests.get(discover_api_url).json() most_popular_films = discover_api["results"] for page in range(2, discover_api["total_pages"]+1): discover_api = requests.get(discover_api_url + f"&page={page}").json() most_popular_films.extend(discover_api["results"]) #printing movie_id and movie_title by popularity desc for i, film in enumerate(most_popular_films): print(i, film['id'], film['title']) </code></pre> <h3>3. <code>next_url</code> field exists! Yay!</h3> <p>Same idea, only now we check for the emptiness of the <code>next_url</code> field - If it's empty, it's the last page.</p> <pre class="prettyprint lang-py prettyprint-override"><code>import requests #to make TMDB API calls #Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18 discover_api = 'https://api.themoviedb.org/3/discover/movie? api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18' discover_api = requests.get(discover_api).json() most_popular_films = discover_api["results"] while discover_api["next_url"]: discover_api = requests.get(discover_api["next_url"]).json() most_popular_films.extend(discover_api["results"]) #printing movie_id and movie_title by popularity desc for i, film in enumerate(most_popular_films): print(i, film['id'], film['title']) </code></pre>

How to loop through paginated API using python

I need to retrieve the 500 most popular films from a REST API, but the results are limited to 20 per page and I am only able to make 40 calls every 10 seconds (https://developers.themoviedb.org/3/getting-started/request-rate-limiting). I am unable to loop through the paginated results dynamically, so that the 500 most popular results are in a single list.

I can successfully return the top 20 most popular films (see below) and enumerate the number of the film, but I am getting stuck working through the loop that allows me to paginate through the top 500 without timing out due to the API rate limit.

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

#Returning all drama films >= 2004 in popularity desc
discover_api = requests.get(discover_api).json()

most_popular_films = discover_api['results']

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])


Sample response:

{
  "page": 1,
  "total_results": 101685,
  "total_pages": 5085,
  "results": [
    {
      "vote_count": 13,
      "id": 280960,
      "video": false,
      "vote_average": 5.2,
      "title": "Catarina and the others",
      "popularity": 130.491,
      "poster_path": "/kZMCbp0o46Tsg43omSHNHJKNTx9.jpg",
      "original_language": "pt",
      "original_title": "Catarina e os Outros",
      "genre_ids": [
        18,
        9648
      ],
      "backdrop_path": "/9nDiMhvL3FtaWMsvvvzQIuq276X.jpg",
      "adult": false,
      "overview": "Outside, the first sun rays break the dawn.  Sixteen years old Catarina can't fall asleep.  Inconsequently, in the big city adults are moved by desire...  Catarina found she is HIV positive. She wants to drag everyone else along.",
      "release_date": "2011-03-01"
    },
    {
      "vote_count": 9,
      "id": 531309,
      "video": false,
      "vote_average": 4.6,
      "title": "Brightburn",
      "popularity": 127.582,
      "poster_path": "/roslEbKdY0WSgYaB5KXvPKY0bXS.jpg",
      "original_language": "en",
      "original_title": "Brightburn",
      "genre_ids": [
        27,
        878,
        18,
        53
      ],

I need the the python loop to append the paginated results into a single list until I have captured the 500 most popular films.


Desired Output:

Movie_ID  Movie_Title
280960    Catarina and the others
531309    Brightburn
438650    Cold Pursuit
537915    After
50465     Glass
457799    Extremely Wicked, Shockingly Evil and Vile

How do I Paginate JSON data in Python?

Paginated JSON will usually have an object with links to the previous and next JSON pages. To get the previous page, you must send a request to the "prev" URL. To get to the next page, you must send a request to the "next" URL. This will deliver a new JSON with new results and new links for the next and previous pages.

Most APIs include a next_url field to help you loop through all results. Let's examine some cases.

1. No `next_url` field

You can just loop through all pages until results field is empty:

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

most_popular_films = []
new_results = True
page = 1
while new_results:
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    new_results = discover_api.get("results", [])
    most_popular_films.extend(new_results)
    page += 1

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

2. Depend on `total_pages` field

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api_url).json()
most_popular_films = discover_api["results"]
for page in range(2, discover_api["total_pages"]+1):
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

3. `next_url` field exists! Yay!

Same idea, only now we check for the emptiness of the next_url field - If it's empty, it's the last page.

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api).json()
most_popular_films = discover_api["results"]
while discover_api["next_url"]:
    discover_api = requests.get(discover_api["next_url"]).json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

How to loop through paginated API using python

Tags:

python

restful-url

izzy84

People also ask

1 Answers

1. No `next_url` field

2. Depend on `total_pages` field

3. `next_url` field exists! Yay!

AdamGold

Recent Activity

Donate For Us

How to loop through paginated API using python

Tags:

python

restful-url

izzy84

People also ask

1 Answers

1. No next_url field

2. Depend on total_pages field

3. next_url field exists! Yay!

AdamGold

Related questions

Recent Activity

Donate For Us

1. No `next_url` field

2. Depend on `total_pages` field

3. `next_url` field exists! Yay!