To give a very simple example, let's take this site: https://www.cardmarket.com/en/Magic/Products/Booster-Boxes/Modern-Horizons-2-Collector-Booster-Box
As you can see, in order to load more listings, you need to press the blue "SHOW MORE RESULTS" button, a few times at that. In a nutshell, is there a way to "click" this button using scrapy or beautiful soup, in order to gain access to all of the listing on that site? If so, how do I do that? If not, what are the most efficient tools that have the capability to do so, in order to allow me to scrape that site? I've heard of selenium, but also heard that it's hella slower than scrapy/beautifulsoup, so would prefer doing so with these two, or using another tool for that
I see that this website loads content using AJAX which is also known as "dynamic page loading" so what you can do is instead of using "resource heavy" Selenium, you can use Requests+bs4 to get it done.
To start, open up the web page and wait for it to finish initial loading, then press "Ctrl+Shift+I" to open the "inspect" windows, then go to "network" tab and click "Load more" button to load more content. Then you'll see something like this
Then if you see the response, this is base64 encoded, then copy the response as CURL like this
Now you have the CURL request in your clipboard, you can easily convert it to python code using this website or using "postman". There you have it.
You can base64 decode to get the response and parse it.
This seems like a good use case for Selenium. You could use it to simulate a browser session and then hand the page source off to Beautiful Soup as needed.
Try something like this:
from selenium import webdriver
from bs4 import BeautifulSoup
# Desired URL
url = "https://www.cardmarket.com/en/Magic/Products/Booster-Boxes/Modern-Horizons-2-Collector-Booster-Box"
# create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
# Get button and click it
python_button = driver.find_element_by_id("loadMoreButton")
python_button.click() #click load more button
# Pass to BS4
soup=BeautifulSoup(driver.page_source)
The "Load More" button on the site you've linked is using AJAX requests to load more data. If you really want to avoid using Selenium then you could try to use the requests
library to replicate the same AJAX request that the button making when it is clicked.
You'll need to monitor the network tab in your browser to figure out the necessary headers. It's likely going to take some fiddling to get it just right.
Potentially Relevant:
Simulating ajax request with python using requests lib
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With