I'm attempting to scrape some data from www.ksl.com/auto/ using Python Requests and Beautiful Soup. I'm able to get the results from the first search page but not subsequent pages. When I request the second page using the same URL Chrome constructs when I click the "Next" button on the page, I get a set of results that no longer matches my search query. I've found other questions on Stack Overflow that discuss Ajax calls that load subsequent pages, and using Chrome's Developer tools to examine the request. But, none of that has helped me with this problem -- which I've had on other sites as well.
Here is an example query that returns only Acuras on the site. When you advance in the browser to the second page, the URL is simply this: https://www.ksl.com/auto/search/index?page=1. When I use Requests to hit those two URLs, the second search results are not Acuras. Is there, perhaps a cookie that my browser is passing back to the server to preserve my filters?
I would appreciate any advice someone can give about how to get subsequent pages of the results I searched for.
Here is the simple code I'm using:
from requests import get
from bs4 import BeautifulSoup
page1 = get('https://www.ksl.com/auto/search/index?keyword=&make%5B%5D=Acura&yearFrom=&yearTo=&mileageFrom=&mileageTo=&priceFrom=&priceTo=&zip=&miles=25&newUsed%5B%5D=All&page=0&sellerType=&postedTime=&titleType=&body=&transmission=&cylinders=&liters=&fuel=&drive=&numberDoors=&exteriorCondition=&interiorCondition=&cx_navSource=hp_search&search.x=63&search.y=8&search=Search+raquo%3B').text
page2 = get('https://www.ksl.com/auto/search/index?page=2').text
soup = BeautifulSoup(page1, 'html.parser')
listings = soup.findAll("div", { "class" : "srp-listing-body-right" })
listings[0] # An Acura - success!
soup2 = BeautifulSoup(page2, 'html.parser')
listings2 = soup2.findAll("div", { "class" : "srp-listing-body-right" })
listings2[0] # Not an Acura. :(
Try this. Create a Session object and then call the links. This will maintain your session with the server when you send a call to the next link.
import requests
from bs4 import BeautifulSoup
s = requests.Session() # Add this line
page1 = s.get('https://www.ksl.com/auto/search/index?keyword=&make%5B%5D=Acura&yearFrom=&yearTo=&mileageFrom=&mileageTo=&priceFrom=&priceTo=&zip=&miles=25&newUsed%5B%5D=All&page=0&sellerType=&postedTime=&titleType=&body=&transmission=&cylinders=&liters=&fuel=&drive=&numberDoors=&exteriorCondition=&interiorCondition=&cx_navSource=hp_search&search.x=63&search.y=8&search=Search+raquo%3B').text
page2 = s.get('https://www.ksl.com/auto/search/index?page=1').text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With