Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving a DBpedia resource by its string name with SPARQL and without knowing its type

As shown in this question which has a similar title, I would like to retrieve a dbpedia resource by knowing a part of its name. I'm a beginner when it comes to SPARQL and such, but the example in the question helped me a lot, as the author searched for "Romania", and the person answering hooked him up with a Sparql request to do the job. That's nice, but here's the thing.

In the example, they already "knew" that Romania is a country, hence the

    ?c a dbpedia-owl:Country ;

in the WHERE clause. The complete sparql request being

    SELECT ?c
    WHERE {
    ?c a dbpedia-owl:Country ;
    foaf:name "Romania"@en .
    FILTER NOT EXISTS {?c dbpedia-owl:dissolutionYear ?y}
    } 

But, this question doesn't quite completely answer our need, hence searching for ANY resource by its name, the "name" being the actual name of a resource, or a part of it, regardless of its (rdf:)type. The goal would be to search for "anything", just knowing the name or a part of it.

I've been doing some research before asking you guys this question, and I already know that the "part of the name" problem could be resolved with bif function (the bad way, since it's not sparql compliant), or the CONTAINS clause, but I couldn't find any example showing how to use it.

Let's now suppose that there's a "word" to search for among the dbpedia resources, that word would be an input from some user. And let's call it "INPUT".

The request, I would imagine, would look like :

   SELECT ?something WHERE
   {
    ?something a (dbpedia Resource).
    CONTAINS(?something,"INPUT")
   }

My major question is about two major aspects :

  1. Is there anything that describes the type Dbpedia Resource ? I don't think it's in ontology or anything. By knwoing that I would like to search among all the resources to find one matching ...
  2. A specific name I would provide, or some string. I considered the FILTER option, but that would mean getting ALL the resources, and then filtering them by their name after they have been retreived, which would be, I guess, not so optimal.

So, does anyone knows this "Master Query" to get a resource by providing its name, or a part of it ? (An example being providing "Obama", and getting results not only for Barrack, but for Michelle as well).

Thank you in advance.

like image 443
Ged ort Avatar asked Dec 26 '11 13:12

Ged ort


1 Answers

I'm assuming that in your first question you are interested in looking at just instance resources. I don't know if you can explicitly ask just for instance resources in the general case, since in RDF everything is a resource. If you specifically need this for the DBpedia dataset you can query for resources that have dcterms:subject as a property (in DBPedia only instance resources have a dcterms:subject). So you can have a query like this:

SELECT DISTINCT ?s ?label WHERE {
            ?s rdfs:label ?label . 
            FILTER (lang(?label) = 'en'). 
            ?label bif:contains "Obama" . 
            ?s dcterms:subject ?sub 
}

Similarly for your second question - if you are using just the DBpedia dataset you might want to use "bif:contains" although is not SPARQL compliant. I don't think there is another optimal way to do this and as you said using FILTER will be sub-optimal especially if you need to execute queries quickly. I think that keyword search and indexing is handled ad-hoc by each triple store there is not yet a standardized way to to full-text searchers.

So to sum up, if you work with dbpedia only just use the features of the store and the specifics of the dataset to solve your problem.

like image 98
ip. Avatar answered Nov 29 '22 05:11

ip.