Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know if the wikipedia content from API contains an useful article or an ambiguous one

I can get the Wikipedia article in XML or any other format. But for a term I want to know first if the returned text contains full article or simply contains ambiguous terms like the entered one.

So "SEO" is an ambiguous(or redirect) term, but how to know this from the results? While "New York" returns complete article.

EDIT

My simple question is, I've 400 city names and I want the wikipedia content of it using API and I don't want those pages which are not city articles but only contain some redirection or other ambiguous terms. I want to discard those.

like image 567
AgA Avatar asked Mar 13 '12 12:03

AgA


2 Answers

You can check with the "Disambiguation" ppprop:

http://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=disambiguation&redirects&format=xml&titles=BNI

like image 189
user2976654 Avatar answered Sep 28 '22 10:09

user2976654


All disambiguation pages are in the aptly named category All disambiguation pages, so you can just check for that category.

As an alternative, you could check for the presence of the Disambiguation template, or one of its variants and their redirects.

like image 41
svick Avatar answered Sep 28 '22 11:09

svick