i just want to get content (no link, no categories, no images...just text)
The MediaWiki software, which drives Wikipedia, allows the use of a subset of HTML 5 elements, or tags and their attributes, for presentation formatting.
Wikipedia and other Wikimedia projects are free, collaborative repositories of knowledge, written and maintained by volunteers from around the world. The Wikimedia API gives you open access to add this free knowledge to your projects and apps.
What is the Wikipedia API? The Wikipedia API (official documentation) is supported by the MediaWiki's API and provide access to Wikipedia and other MediaWiki data without interacting with the user interface.
This page is a simple guide to finding that ID. In the desktop view of Wikipedia, in the default skin and most others, the left-hand panel has a "Wikidata item" link, under " tools ". Copy the URL of that link, paste it into a text editor, and read (or copy) the ID from it.
There is no way to get "just the text" from the Wikipedia API. You can either download the HTML of the page (if you do this via index.php rather than api.php, use action=render
to avoid downloading all the skin content) or the wikitext (which you can do via the API or by passing action=raw
to index.php); you will then have to parse it yourself to remove the bits you don't want to keep.
In the HTML output, MediaWiki is generally good about adding classes to various interface elements you might want to filter out; the templates and such created by users are perhaps less so (e.g. the hack for table sorting just puts some text in a display:none
span, no class).
To get the wikitext via the API, use prop=revisions
. To get the rendered HTML, use action=parse
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With