I have a sparql-Query, that asks for certain properties of URIs of a given type. As I am not sure, whether those properties exists, I use the OPTIONAL Keyword:
PREFIX mbo: <http://creativeartefact.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?uri a mbo:LiveMusicEvent.
OPTIONAL {?uri rdfs:label ?label}.
OPTIONAL {?uri mbo:organisedBy ?organiser}.
OPTIONAL {?uri mbo:takesPlaceAt ?venue}.
OPTIONAL {?uri mbo:begin ?begin}.
OPTIONAL {?uri mbo:end ?end}.
}
When I run this query against my SPARQL-Endpoint (Virtuoso Server), I got the error:
Virtuoso 42000 Error The estimated execution time -721420288 (sec) exceeds the limit of 400 (sec).
When I reduce the OPTIONAL clauses, after the first removed clause the estimated execution time is 4106 seconds, when I remove two clauses, the query is executed (and return the values instantly).
I cannot see, why the estimated execution time is skyrocketing like this with the additional OPTIONAL clauses, but maybe I'm just using a wrong constructed query?
OPTIONAL patterns are generally expensive to evaluate (compared to "normal" join patterns) for a SPARQL engine. In this case, the error indicates that Virtuoso's query planner estimates the query to be too complex to perform within the set time limit (notice that it estimates this - so the precise value may be wrong).
You have several alternatives. Most of them involve doing more than one query, though. A common pattern is the "retrieve-and-iterate" pattern - you first do a query that retrieves all instances of mbo:LiveMusicEvent
:
SELECT ?uri WHERE { ?uri a mbo:LiveMusicEvent }
and then you iterate over the result and retrieve each instance's optional properties :
SELECT *
WHERE { VALUES(?uri) { <http://example.org/instance1> }
OPTIONAL {?uri rdfs:label ?label}.
OPTIONAL {?uri mbo:organisedBy ?organiser}.
OPTIONAL {?uri mbo:takesPlaceAt ?venue}.
OPTIONAL {?uri mbo:begin ?begin}.
OPTIONAL {?uri mbo:end ?end}.
}
As you can see I use a VALUES
clause to insert the instance id results from the first query into this second query. In this example, I am assuming you iterate one by one and therefore do a query for each instance, but as a further optimization you might tinker with adding more than one instance into the VALUES
clause in one go (obviously not all of them at once though, as that would make the query the same complexity as the original one).
By the way, VALUES
is a SPARQL 1.1 feature, and I am not certain that Virtuoso supports it. If not, you can achieve the same effect either by using a FILTER
clause or by just 'manually' replacing all occurrences of the variable ?uri
with the instance id for each iteration.
Another way to handle it is to first do a CONSTRUCT query that retrieves a relevant subset of data from the larger source, and then do your more complex query with optionals on that subset. For example:
CONSTRUCT
WHERE {
?uri a mbo:LiveMusicEvent;
?p ?o .
}
will retrieve all data about the LiveMusicEvent
instances as an RDF graph. Pop that graph into a local RDF model (e.g. a Sesame Model or in-memory Repository if you're working in Java), and query it further from there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With