I am testing using the requests
module to get the content of a webpage. But when I look at the content I see that it does not get the full content of the page.
Here is my code:
import requests
from bs4 import BeautifulSoup
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
Also on the chrome web-browser if I look at the page source I do not see the full content.
Is there a way to get the full content of the example page that I have provided?
The Python error "ModuleNotFoundError: No module named 'requests'" occurs for multiple reasons: Not having the requests package installed by running pip install requests . Installing the package in a different Python version than the one you're using. Installing the package globally and not in your virtual environment.
The reason why request might be blocked is that, for example in Python requests library, default user-agent is python-requests and websites understands that it's a bot and might block a request in order to protect the website from overload, if there's a lot of requests being sent.
The get() method sends a GET request to the specified url.
The page is rendered with JavaScript making more requests to fetch additional data. You can fetch the complete page with selenium.
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())
For other solutions see my answer to Scraping Google Finance (BeautifulSoup)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With