Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get list of desambiguated homonyms from wikipedia / wikidata / linked data

If I search for "George Bush" manually on wikipedia I'll get this page which lists homonyms with short descriptions.

I would like to feed my search to an api and get the following info :

  • George H. W. Bush
  • George W. Bush
  • George Bush (biblical scholar)
  • George Bush (footballer)
  • George Bush (racing driver)
  • George P. Bush
  • George Washington Bush

I don't mind getting more as long as I can unambiguously parse it.

My goal is to have a website's users able to tag a public person, but I want to restrict their choices and avoid ambiguities, so this list could be slightly different, any other decent database with an api would do.

I haven't figured out how to do it with wikipedia nor wikidata, I just managed to do queries on a specific id/page once I know it, which isn't the case here.

like image 516
Moody_Mudskipper Avatar asked Sep 23 '18 09:09

Moody_Mudskipper


1 Answers

There are a couple of ways to do this, depending on what sort of data you want.

For example - https://en.wikipedia.org/w/api.php?action=query&titles=George%20Bush&prop=links - will tell you if there is a "disambiguation" for that person's name.

That will return:

               {
                    "ns": 0,
                    "title": "Bush family"
                },
                {
                    "ns": 0,
                    "title": "George Brush (disambiguation)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (biblical scholar)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (footballer)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (racing driver)"
                },
                {
                    "ns": 0,
                    "title": "George H. W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George P. Bush"
                },
                {
                    "ns": 0,
                    "title": "George W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George Washington Bush"

You can get more data at once using - https://en.wikipedia.org/w/api.php?action=query&utf8=&list=search&srsearch=George%20Bush

That will get you:

    "search": [
        {
            "ns": 0,
            "title": "George W. Bush",
            "pageid": 3414021,
            "size": 299185,
            "wordcount": 27007,
            "snippet": "<span class=\"searchmatch\">George</span> Walker <span class=\"searchmatch\">Bush</span> (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009. He had previously",
            "timestamp": "2018-09-26T21:48:08Z"
        },
        {
            "ns": 0,
            "title": "George H. W. Bush",
            "pageid": 11955,
            "size": 210189,
            "wordcount": 20867,
            "snippet": "<span class=\"searchmatch\">George</span> Herbert Walker <span class=\"searchmatch\">Bush</span> (born June 12, 1924) is an American politician who served as the 41st President of the United States from 1989 to 1993. Prior",
            "timestamp": "2018-10-01T06:41:50Z"
        },
like image 83
Terence Eden Avatar answered Nov 15 '22 10:11

Terence Eden