Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get more info within only one geosearch call via Wikipedia API?

I am using an API call similar to http://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=41.426140|26.099319.

I returns something like this

<?xml version="1.0"?>
<api>
  <query>
    <geosearch>
      <gs pageid="27460829" ns="0" title="Kostilkovo" lat="41.416666666667" lon="26.05" dist="4245.1" primary="" />
      <gs pageid="27460781" ns="0" title="Belopolyane" lat="41.45" lon="26.15" dist="4988.7" primary="" />
      <gs pageid="27460862" ns="0" title="Siv Kladenets" lat="41.416666666667" lon="26.166666666667" dist="5713.5" primary="" />
      <gs pageid="13811116" ns="0" title="Svirachi" lat="41.483333333333" lon="26.116666666667" dist="6521.9" primary="" />
      <gs pageid="27460810" ns="0" title="Gorno Lukovo" lat="41.366666666667" lon="26.1" dist="6613.4" primary="" />
      <gs pageid="27460799" ns="0" title="Dolno Lukovo" lat="41.366666666667" lon="26.083333333333" dist="6746.2" primary="" />
      <gs pageid="27460827" ns="0" title="Kondovo" lat="41.433333333333" lon="26.016666666667" dist="6937" primary="" />
      <gs pageid="27460848" ns="0" title="Plevun" lat="41.45" lon="26.016666666667" dist="7383.1" primary="" />
      <gs pageid="24179704" ns="0" title="Villa Armira" lat="41.499069444444" lon="26.106263888889" dist="8130" primary="" />
      <gs pageid="27460871" ns="0" title="Zhelezari" lat="41.413333333333" lon="25.998333333333" dist="8540.1" primary="" />
    </geosearch>
  </query>
</api>

But while I am actually trying to get some pictures of those pages, subsequent calls are needed, like

  • to get some page images

    http://en.wikipedia.org/w/api.php?action=query&prop=images&pageids=13843906

  • then, to get image info

    http://en.wikipedia.org/w/api.php?action=query&titles=File:Alexandru_Ioan_Cuza_Dealul_Patriarhiei.jpg&prop=imageinfo&iiprop=url

Well, even if this gets me what I ultimately need, it is not efficient at all.

I would like to know if there are some parameters for this calls, or maybe completely other call(s) that would bring all this info in maximum 2 steps/calls. It would be great, though, if it would be only one.

like image 340
Michael Avatar asked Jul 02 '14 11:07

Michael


2 Answers

Wow, I had no idea that such a feature exists nowadays! But to answer your question, since it's a list query, you can probably use it as a generator.

Let's try it:

  • Original geosearch query: http://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=41.426140|26.099319

  • Generator query to get images on matching pages: http://en.wikipedia.org/w/api.php?action=query&prop=images&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319

The prop=images query can also be used as a generator, so you can also do this:

  • Get URLs for all images on a list of pages: http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&generator=images&gimlimit=max&pageids=13811116|24179704|27460781|27460799|27460810|27460827|27460829|27460848|27460862|27460871

Alas, AFAIK you can't nest generators, so you can't do both steps in one query. You can either:

  1. get the list of images in one query, and then use another query to get the URLs, or
  2. start with the basic geosearch query to get the page IDs, and then get the images and their URLs in another query.

Alas, it turns out that both of these options fail to give you some information that you may want. If you use list=geosearch as a generator, you don't get the coordinate information that you may need if you e.g. wish to display the results on a map. On the other hand, using prop=images as a generator makes you miss out on something even more important: the knowledge of which images are used on which pages!

Thus, unfortunately, it seems that, if your goal is to place images on a map, you'll probably have to do it with three separate queries. At least you can still query multiple pages / images in one request, so you shouldn't need more than three (until you hit the query limits and need to use continuations, that is).

(Also, doing it in three steps lets you apply some filtering to the images before the third step. For example, most of the pages returned by your example query only have the same three images — Flag of Bulgaria.svg, Ivaylovgrad Reservoir.jpg and Oblast Khaskovo.png — all of which are used via templates, and none of which really look like good choices to represent the specific location.)

Ps. If you're just interested in finding images near a particular location, even if they're not used on any specific Wikipedia article, you might want to try using geosearch directly on Wikimedia Commons. It doesn't seem to return any results for your Bulgarian example coordinates, but it works just fine in a more crowded location.

like image 69
Ilmari Karonen Avatar answered Oct 12 '22 09:10

Ilmari Karonen


Here is an alternative to build on the previous answer. If you start with this query as a partial answer:

https://en.wikipedia.org/w/api.php?action=query&prop=images&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319

Then you can build on this to get the information in a single query. The pageimages property can work with the generator. You cannot nest generators but you can chain properties. A query can use pageimages to get the page's main image url for each of the geosearch results. It looks like this:

https://en.wikipedia.org/w/api.php?action=query&prop=images|pageimages&pilimit=max&piprop=thumbnail&iwurl=&imlimit=max&generator=geosearch&ggsradius=10000&ggscoord=41.426140|26.099319

This query returns the image "File" names (images property) and a single URL for the main image (pageimages property). The main image of the page is all I need. You might be able to extrapolate the "file" urls by matching the changes from the file to the url that is output with the query but I cannot recommend such a hack.

The images property has a setting that is supposed to return urls for interwiki links, iwurl. I see the "file" as an interwiki link. This parameter is not working and images does not return a url. Playing on the sandbox might lead you to a better answer.

Intuitively it seems like you should be able to chain the images and imageinfo properties together. Doing so does not give the expected results.

If a single url for the main image of the page is not enough I can encourage you to play in the API sandbox to try and get what you need with some combination of properties. I am using the geosearch generator and get the page image, text description, and lat/long coordinates so that I can get the address. Good luck!

like image 27
Beth Mezias Avatar answered Oct 12 '22 11:10

Beth Mezias