Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting DBPedia Infobox categories

I'm currently looking for a way to query DBPedia's Infobox Onyology database via the SPARQL endpoint to get a list of the classes, the subclasses of a selected class, and the properties of a given class. As far as I've been able to find, you either need to know the property that you are looking for or search for something specific - all the examples I've found seem predicated on the idea that you would want to search for something specific (like populations of cities above a certain elevation, etc), whereas I'd like to build something where I can effectively "browse" the categories. For example, starting with the list of subclasses of "owl:Thing" on this class hierarchy chart and presenting the user with the list of subclasses of a selected subclass. It seems possible to browse something like this via the mappings wiki, but it would be preferable to query the SPARQL endpoint directly.

Is there some simple SPARQL query that would return the available classes and properties of those classes?

Update: I've come up with a way to get the class hierarchy it seems, by iterating through this query:

SELECT ?subject WHERE {
     ?subject rdfs:subClassOf owl:Thing
}

Which returns a list of subclasses of owl:Thing, and if I replace owl:Thing with one of the subclasses, I get the list of subclasses of that, until there are no subclasses, at which point I can select all the resources which have a type given by the chosen subclass. I'm still not quite sure how to get all the properties common to the subclass, though.

Update 2 Getting closer now. This query gets me all the properties (children of dbpedia:property) that are also a country, as well as their titles:

SELECT DISTINCT ?prop ?title WHERE {
     ?country ?prop ?value.
     ?country a <http://dbpedia.org/ontology/Country>.
     ?prop rdf:type rdf:Property.
     ?prop rdfs:label ?title
}

Which is actually all I really asked for. The last thing I'm trying to do now is to try to order these by the number of pages in which they appear (presumably the most common properties will be the ones of greatest interest).

like image 469
Paul Avatar asked Mar 19 '11 20:03

Paul


3 Answers

OK, so I've actually figured out more or less exactly how to do this, so I'm submitting this as an answer rather than just an edit. What seems to give me exactly what I'm looking for is to start by iterating through the class heirarchy using this query:

SELECT ?class ?label WHERE {
     ?class rdfs:subClassOf owl:Thing.
     ?class rdfs:label ?label. 
     FILTER(lang(?label) = "en")
}

Feeding the selected result into the query in place of owl:Thing each time.

Once the user has selected the lowest-level class that they'd like, to display a list of properties, in descending order by the number of entries in which they appear, I use this query:

SELECT ?prop ?title WHERE {
     ?country ?prop [].
     ?country a <http://dbpedia.org/ontology/Country>.
     ?prop rdf:type rdf:Property.
     ?prop rdfs:label ?title
} ORDER BY DESC(COUNT(DISTINCT ?country))

Of course, if you actually look at those results, there are some funky properties there that don't have very descriptive labels ("s"? What?), but this is at least what I was looking for in the first place.

like image 144
Paul Avatar answered Sep 28 '22 18:09

Paul


(1) Query for all existing classes:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?class
WHERE {
  ?s rdf:type ?class .
}

(2) Query for all properties used in any instance of class C:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?property
WHERE {
  ?s rdf:type <C> .
  ?s ?property ?o
}
like image 45
Manuel Salvadores Avatar answered Sep 28 '22 16:09

Manuel Salvadores


This will get you all of the properties whose rdfs:domain is SpaceMissions:

select ?property where {
    ?property rdfs:domain <http://dbpedia.org/ontology/SpaceMission>
}

These properties all accept SpaceMission as the subject.

Note that in RDF(S), it's not required to have an explicit rdfs:domain statement for each property, because rdfs:domains can be implied by the usage of the property. You may find therefore that this query will give you a list of all properties that have been defined with a domain of SpaceMission but won't give you a list of all properties that are actually used with all of the instances of SpaceMission.

like image 41
Jonathan Avatar answered Sep 28 '22 16:09

Jonathan