Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve dbpedia subject categories with SPARQL

Tags:

sparql

dbpedia

Is there a way to retrieve all categories from dcterms:subject inside dbpedia?

As an example, in http://dbpedia.org/page/Eiffel_Tower I can see in dcterms:subject the following categories:

  • category:Former_world's_tallest_buildings
  • category:Places_with_restrictions_on_photography
  • category:Michelin_Guide_starred_restaurants_and_chefs
  • category:Historic_Civil_Engineering_Landmarks
  • category:1889_architecture
  • ...

I wish to retrieve all category:xxx values in dbpedia. Is there a way?

like image 215
Luca Avatar asked Jun 16 '11 16:06

Luca


1 Answers

If you go a do a COUNT query to see how many categories are in dbpedia using the following SPARQL query:

SELECT COUNT(DISTINCT ?category) AS ?count WHERE {?subject dcterms:subject ?category}

you'll get that dbpedia has 503788 categories. If you query for all the categories the endpoint will not give you the whole 503788 categories since it has a cap on how many results you can get back. But you can issue multiple queries by using LIMIT and OFFSET. For example to get the first 1000 categories you can do the following query:

SELECT DISTINCT ?category WHERE {?subject dcterms:subject ?category} LIMIT 1000 OFFSET 0

I don't know how are you going to use this information but my recommendation would be to run multiple queries with incrementing the offset (e.g. 1000, 2000, 3000) and cache the results in whatever storage you are using. You can basically write a program that does executes the queries and places the results in the cache.

Do remember however that the categories in DBPedia are hierarchical, so one category is a borader category from several others.

like image 155
ip. Avatar answered Jan 01 '23 01:01

ip.