Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wikipedia API - get random page(s)

I'm trying to get a JSON result with a set of random pages from Wikipedia, including their titles, content and images.

I've played around with their API sandbox, and so far the best I've got is this:

https://en.wikipedia.org/w/api.php?action=query&list=random&format=json&rnnamespace=0&rnlimit=10

But this only includes the namespace, id, and title of ten random pages. I would like to get the content as well as images as well.

Do anyone know how?

Alternatively I could do with the title, content and image url's of a single random page. Best I've got here is:

https://en.wikipedia.org/w/api.php?action=query&generator=random&format=json

like image 756
Petter Avatar asked Nov 09 '15 17:11

Petter


People also ask

How do I get a random page on Wikipedia?

This is easily done by going to the Special:Random page, which you can find on the left side of the page, under the Wikipedia logo, which says Show any page. Once you click this link, you will be taken directly to a random article page without having to do anything else.

Is Wikipedia API free?

Wikipedia and other Wikimedia projects are free, collaborative repositories of knowledge, written and maintained by volunteers from around the world. The Wikimedia API gives you open access to add this free knowledge to your projects and apps.

How do I get a content page on Wikipedia?

There are three main methods for retrieving page content via the API: Get the contents of a page using the Revisions API (as wikitext). Get the contents of a page using the Parse API (as HTML or wikitext). Get plain text or limited HTML extracts of a page using the API of the TextExtracts extension.

How does Wikipedia random article work?

When you click random article it generates a target random number and then returns the article whose recorded random number is closest to this target. If you are interested you can read the actual code here.


2 Answers

You're close. generator=random is the right way to go. You can then use various prop values to get the info you want:

  • Page title is always included.

  • To get the text, use prop=revisons along with rvprop=content.

  • To get all images used on the page, use prop=images.

    Note that this will often include images you're probably not interested in, like icons and flags. To fix that, you might try instead prop=pageimages, though it doesn't seem to work always. Or you could try using both.

So, the final query could look like this:

https://en.wikipedia.org/w/api.php?format=json&action=query&generator=random&grnnamespace=0&prop=revisions|images&rvprop=content&grnlimit=10

like image 187
svick Avatar answered Oct 01 '22 04:10

svick


If you'd rather use their REST api,

curl -X GET "https://en.wikipedia.org/api/rest_v1/page/random/summary"

Documentation

like image 20
mancini0 Avatar answered Oct 01 '22 06:10

mancini0