Python requests does not response correct html

Question

I would really appreciate if someone could help me with a problem. I am trying to scrape website https://www.marketwatch.com/investing/index/xxx as xxx being stock symbol. For example https://www.marketwatch.com/investing/index/spx. My code worked more than year but for some reason does not work anymore as requesting a page will return some weird part on html. As you can see the webpage is more complicated than my request result. I also tried beautifulsoup and so on as I though that problem is about javascript, but I get a same result.

Part of code (with requests):

url = "https://www.marketwatch.com/investing/index/spx"
page = requests.get(url)
print(page.content)

Result:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<link href="about:blank" rel="shortcut icon"/>
<script     src="https://cdnjs.cloudflare.com/ajax/libs/json3/3.3.2/json3.min.js">            </script>
<script src="https://resources.kasadapolyform.io/kpfp.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-    862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&amp;token=46f828d0-bb88-fcd0-c7ad-47f18d3c13a2"></script>
</head>
<body>
</body>
</html>

I would really appreciate the help.

Edward Minnix · Accepted Answer

As mentioned by Jaxi, the html returned implies that the page is almost entirely rendered by JavaScript instead of HTML.

In order to work around this you will need to use a tool which will allow you to run the JavaScript and then use that HTML.

One example is Selenium, which is used in UI testing.

Another is Kenneth Reitz's (the original author of the requests package) package requests_html. This will use the Chromium browser under the hood and render the page for you. From the README:

>>> r = session.get('http://python-requests.org')

>>> r.html.render()

>>> r.html.search('Python 2 will retire in only {months} months!')['months']
'<time>25</time>'

As a side note, as mentioned by ewindes, you should always be careful and make sure that the sites you are scraping permit web scraping. If not as a matter of legality, than one of courtesy.

Python requests does not response correct html

Tags:

python

python-3.x

Valme

1 Answers

Edward Minnix

Recent Activity

Donate For Us

Python requests does not response correct html

Tags:

python

python-3.x

Valme

1 Answers

Edward Minnix

Related questions

Recent Activity

Donate For Us