Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: using requests does not get the full content of a web page

I am testing using the requests module to get the content of a webpage. But when I look at the content I see that it does not get the full content of the page.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

Also on the chrome web-browser if I look at the page source I do not see the full content.

Is there a way to get the full content of the example page that I have provided?

like image 899
TJ1 Avatar asked Dec 09 '17 16:12

TJ1


People also ask

Why does Python request not work?

The Python error "ModuleNotFoundError: No module named 'requests'" occurs for multiple reasons: Not having the requests package installed by running pip install requests . Installing the package in a different Python version than the one you're using. Installing the package globally and not in your virtual environment.

Can websites block Python requests?

The reason why request might be blocked is that, for example in Python requests library, default user-agent is python-requests and websites understands that it's a bot and might block a request in order to protect the website from overload, if there's a lot of requests being sent.

What does requests get () do?

The get() method sends a GET request to the specified url.


1 Answers

The page is rendered with JavaScript making more requests to fetch additional data. You can fetch the complete page with selenium.

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

For other solutions see my answer to Scraping Google Finance (BeautifulSoup)

like image 172
Dan-Dev Avatar answered Sep 23 '22 13:09

Dan-Dev