Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy/BeautifulSoup simulating 'clicking' a button in order to load a section of the website

To give a very simple example, let's take this site: https://www.cardmarket.com/en/Magic/Products/Booster-Boxes/Modern-Horizons-2-Collector-Booster-Box

As you can see, in order to load more listings, you need to press the blue "SHOW MORE RESULTS" button, a few times at that. In a nutshell, is there a way to "click" this button using scrapy or beautiful soup, in order to gain access to all of the listing on that site? If so, how do I do that? If not, what are the most efficient tools that have the capability to do so, in order to allow me to scrape that site? I've heard of selenium, but also heard that it's hella slower than scrapy/beautifulsoup, so would prefer doing so with these two, or using another tool for that

like image 723
Entman Avatar asked Oct 19 '25 12:10

Entman


2 Answers

I see that this website loads content using AJAX which is also known as "dynamic page loading" so what you can do is instead of using "resource heavy" Selenium, you can use Requests+bs4 to get it done. To start, open up the web page and wait for it to finish initial loading, then press "Ctrl+Shift+I" to open the "inspect" windows, then go to "network" tab and click "Load more" button to load more content. Then you'll see something like this enter image description here

Then if you see the response, this is base64 encoded, then copy the response as CURL like this enter image description here

Now you have the CURL request in your clipboard, you can easily convert it to python code using this website or using "postman". There you have it.

You can base64 decode to get the response and parse it.

like image 178
Muhammad Hassan Avatar answered Oct 21 '25 02:10

Muhammad Hassan


This seems like a good use case for Selenium. You could use it to simulate a browser session and then hand the page source off to Beautiful Soup as needed.

Try something like this:

from selenium import webdriver
from bs4 import BeautifulSoup

# Desired URL
url = "https://www.cardmarket.com/en/Magic/Products/Booster-Boxes/Modern-Horizons-2-Collector-Booster-Box"

# create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

# Get button and click it
python_button = driver.find_element_by_id("loadMoreButton")
python_button.click() #click load more button

# Pass to BS4
soup=BeautifulSoup(driver.page_source)

If You Want To Avoid Selenium:

The "Load More" button on the site you've linked is using AJAX requests to load more data. If you really want to avoid using Selenium then you could try to use the requests library to replicate the same AJAX request that the button making when it is clicked.

You'll need to monitor the network tab in your browser to figure out the necessary headers. It's likely going to take some fiddling to get it just right.

Potentially Relevant:

Simulating ajax request with python using requests lib

like image 41
Adam E. Avatar answered Oct 21 '25 03:10

Adam E.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!