I'm trying to get a JSON result with a set of random pages from Wikipedia, including their titles, content and images.
I've played around with their API sandbox, and so far the best I've got is this:
https://en.wikipedia.org/w/api.php?action=query&list=random&format=json&rnnamespace=0&rnlimit=10
But this only includes the namespace, id, and title of ten random pages. I would like to get the content as well as images as well.
Do anyone know how?
Alternatively I could do with the title, content and image url's of a single random page. Best I've got here is:
https://en.wikipedia.org/w/api.php?action=query&generator=random&format=json
This is easily done by going to the Special:Random page, which you can find on the left side of the page, under the Wikipedia logo, which says Show any page. Once you click this link, you will be taken directly to a random article page without having to do anything else.
Wikipedia and other Wikimedia projects are free, collaborative repositories of knowledge, written and maintained by volunteers from around the world. The Wikimedia API gives you open access to add this free knowledge to your projects and apps.
There are three main methods for retrieving page content via the API: Get the contents of a page using the Revisions API (as wikitext). Get the contents of a page using the Parse API (as HTML or wikitext). Get plain text or limited HTML extracts of a page using the API of the TextExtracts extension.
When you click random article it generates a target random number and then returns the article whose recorded random number is closest to this target. If you are interested you can read the actual code here.
You're close. generator=random
is the right way to go. You can then use various prop
values to get the info you want:
Page title is always included.
To get the text, use prop=revisons
along with rvprop=content
.
To get all images used on the page, use prop=images
.
Note that this will often include images you're probably not interested in, like icons and flags. To fix that, you might try instead prop=pageimages
, though it doesn't seem to work always. Or you could try using both.
So, the final query could look like this:
https://en.wikipedia.org/w/api.php?format=json&action=query&generator=random&grnnamespace=0&prop=revisions|images&rvprop=content&grnlimit=10
If you'd rather use their REST api,
curl -X GET "https://en.wikipedia.org/api/rest_v1/page/random/summary"
Documentation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With