Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python requests vs. urllib2

I have used requests library for many times and I know it has a ton of advantages. However, I was trying to retrieve the following Wikipedia page:

https://en.wikipedia.org/wiki/Talk:Land_value_tax

and requests.get retrieves it partially:

response = requests.get('https://en.wikipedia.org/wiki/Talk:Land_value_tax', verify=False)
html = response.text

I tried it using urllib2 and urllib2.urlopen and it retrieves the same page completely:

html = urllib2.urlopen('https://en.wikipedia.org/wiki/Talk:Land_value_tax').read()

Does anyone know why this happens and how to solve it using requests?

By the way, looking at the number of times this post has been viewed, I realized that people are interested to know the differences between these two libraries. If anyone knows about other differences between these two libraries, I'll appreciate it if they edit this question or post an answer and add those differences.

like image 992
1man Avatar asked Jun 19 '26 18:06

1man


1 Answers

Seems to me the problem lies in the scripting on the target page. The js-driven content is rendered in here (especially i found calls to mediawiki). So, you need to look at web sniffer to identify it: enter image description here

What to do? If you want to retrieve the whole page content, you better plugin any of libraries working out (evaluating) in page javascript. Read more here.

Update

I am not interested in retrieving the whole page and statistics or JS libraries retrieved from MediaWiki. I only need the whole content of the page (through scraping, not MediaWiki API).

The issue is that those js calls to other resources (incl. mediawiki) make possible to render the WHOLE page to client. But since the library does not support JS execution, js is not executed => page parts are not loaded from other resources => target page is not whole.

like image 198
Igor Savinkin Avatar answered Jun 21 '26 08:06

Igor Savinkin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!