Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative for OPTIONAL Keyword in SPARQL-Queries?

I have a sparql-Query, that asks for certain properties of URIs of a given type. As I am not sure, whether those properties exists, I use the OPTIONAL Keyword:

PREFIX mbo: <http://creativeartefact.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT * WHERE {
  ?uri a mbo:LiveMusicEvent. 
    OPTIONAL {?uri rdfs:label ?label}. 
    OPTIONAL {?uri mbo:organisedBy ?organiser}. 
    OPTIONAL {?uri mbo:takesPlaceAt ?venue}. 
    OPTIONAL {?uri mbo:begin ?begin}. 
    OPTIONAL {?uri mbo:end ?end}. 
}

When I run this query against my SPARQL-Endpoint (Virtuoso Server), I got the error:

Virtuoso 42000 Error The estimated execution time -721420288 (sec) exceeds the limit of 400 (sec).

When I reduce the OPTIONAL clauses, after the first removed clause the estimated execution time is 4106 seconds, when I remove two clauses, the query is executed (and return the values instantly).

I cannot see, why the estimated execution time is skyrocketing like this with the additional OPTIONAL clauses, but maybe I'm just using a wrong constructed query?

like image 458
Aaginor Avatar asked Sep 01 '14 16:09

Aaginor


1 Answers

OPTIONAL patterns are generally expensive to evaluate (compared to "normal" join patterns) for a SPARQL engine. In this case, the error indicates that Virtuoso's query planner estimates the query to be too complex to perform within the set time limit (notice that it estimates this - so the precise value may be wrong).

You have several alternatives. Most of them involve doing more than one query, though. A common pattern is the "retrieve-and-iterate" pattern - you first do a query that retrieves all instances of mbo:LiveMusicEvent:

 SELECT ?uri WHERE { ?uri a mbo:LiveMusicEvent } 

and then you iterate over the result and retrieve each instance's optional properties :

SELECT * 
WHERE { VALUES(?uri) { <http://example.org/instance1> } 
        OPTIONAL {?uri rdfs:label ?label}. 
        OPTIONAL {?uri mbo:organisedBy ?organiser}. 
        OPTIONAL {?uri mbo:takesPlaceAt ?venue}. 
        OPTIONAL {?uri mbo:begin ?begin}. 
        OPTIONAL {?uri mbo:end ?end}. 
}

As you can see I use a VALUES clause to insert the instance id results from the first query into this second query. In this example, I am assuming you iterate one by one and therefore do a query for each instance, but as a further optimization you might tinker with adding more than one instance into the VALUES clause in one go (obviously not all of them at once though, as that would make the query the same complexity as the original one).

By the way, VALUES is a SPARQL 1.1 feature, and I am not certain that Virtuoso supports it. If not, you can achieve the same effect either by using a FILTER clause or by just 'manually' replacing all occurrences of the variable ?uri with the instance id for each iteration.

Another way to handle it is to first do a CONSTRUCT query that retrieves a relevant subset of data from the larger source, and then do your more complex query with optionals on that subset. For example:

 CONSTRUCT 
 WHERE { 
    ?uri a mbo:LiveMusicEvent; 
         ?p ?o . 
 }

will retrieve all data about the LiveMusicEvent instances as an RDF graph. Pop that graph into a local RDF model (e.g. a Sesame Model or in-memory Repository if you're working in Java), and query it further from there.

like image 159
Jeen Broekstra Avatar answered Sep 30 '22 10:09

Jeen Broekstra