Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get the HTML of a wiki page with Pywikibot?

I'm using pywikibot-core, and I used before another python Mediawiki API wrapper as Wikipedia.py (which has a .HTML method). I switched to pywikibot-core 'cause I think it has many more features, but I can't find a similar method. (beware: I'm not very skilled).

like image 721
Aubrey Avatar asked Dec 12 '14 11:12

Aubrey


2 Answers

I'll post here user283120 second answer, more precise than the first one:

Pywikibot core doesn't support any direct (HTML) way to interact to Wiki, so you should use API. If you need to, you can do it easily by using urllib2.

This is an example I used to get HTML of a wiki page in commons: import urllib2 ... url = "https://commons.wikimedia.org/wiki/" + page.title().replace(" ","_") html = urllib2.urlopen(url).read().decode('utf-8')

like image 93
Aubrey Avatar answered Sep 20 '22 20:09

Aubrey


"[saveHTML.py] downloads the HTML-pages of articles and images and saves the interesting parts, i.e. the article-text and the footer to a file"

source: https://git.wikimedia.org/blob/pywikibot%2Fcompat.git/HEAD/saveHTML.py

like image 25
valepert Avatar answered Sep 19 '22 20:09

valepert