Python requests vs. urllib2

Question

I have used requests library for many times and I know it has a ton of advantages. However, I was trying to retrieve the following Wikipedia page:

https://en.wikipedia.org/wiki/Talk:Land_value_tax

and requests.get retrieves it partially:

response = requests.get('https://en.wikipedia.org/wiki/Talk:Land_value_tax', verify=False)
html = response.text

I tried it using urllib2 and urllib2.urlopen and it retrieves the same page completely:

html = urllib2.urlopen('https://en.wikipedia.org/wiki/Talk:Land_value_tax').read()

Does anyone know why this happens and how to solve it using requests?

By the way, looking at the number of times this post has been viewed, I realized that people are interested to know the differences between these two libraries. If anyone knows about other differences between these two libraries, I'll appreciate it if they edit this question or post an answer and add those differences.

Igor Savinkin · Accepted Answer

Seems to me the problem lies in the scripting on the target page. The js-driven content is rendered in here (especially i found calls to mediawiki). So, you need to look at web sniffer to identify it: enter image description here

What to do? If you want to retrieve the whole page content, you better plugin any of libraries working out (evaluating) in page javascript. Read more here.

Update

I am not interested in retrieving the whole page and statistics or JS libraries retrieved from MediaWiki. I only need the whole content of the page (through scraping, not MediaWiki API).

The issue is that those js calls to other resources (incl. mediawiki) make possible to render the WHOLE page to client. But since the library does not support JS execution, js is not executed => page parts are not loaded from other resources => target page is not whole.

Python requests vs. urllib2

Tags:

python

python-requests

urllib2

web-scraping

partial

1man

1 Answers

Update

Igor Savinkin

Recent Activity

Donate For Us

Python requests vs. urllib2

Tags:

python

python-requests

urllib2

web-scraping

partial

1man

1 Answers

Update

Igor Savinkin

Related questions

Recent Activity

Donate For Us