I have RDF in turtle format in this following
@prefix ab: <http://learningsparql.com/ns/addressbook#> .
@prefix d: <http://learningsparql.com/ns/data#> .
d:i0432 ab:firstName "Richard" .
d:i0432 ab:lastName "Mutt" .
d:i0432 ab:homeTel "(229) 276-5135" .
d:i0432 ab:nick "Dick" .
d:i0432 ab:email "[email protected]" .
d:i9771 ab:firstName "Cindy" .
d:i9771 ab:lastName "Marshall" .
d:i9771 ab:homeTel "(245) 646-5488" .
d:i9771 ab:email "[email protected]" .
d:i8301 ab:firstName "Craig" .
d:i8301 ab:lastName "Ellis" .
d:i8301 ab:workTel "(245) 315-5486" .
d:i8301 ab:email "[email protected]" .
d:i8301 ab:email "[email protected]" .
and the query is
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?first ?last
WHERE
{
?s ab:lastName ?last .
OPTIONAL {?s ab:nick ?first. }.
OPTIONAL {?s ab:firstName ?first .}.
}
the result is
------------------------
| first | last |
========================
| "Craig" | "Ellis" |
| "Cindy" | "Marshall" |
| "Dick" | "Mutt" |
------------------------
but if i change the query to
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?first ?last
WHERE
{
OPTIONAL {?s ab:nick ?first. }.
OPTIONAL {?s ab:firstName ?first .}.
?s ab:lastName ?last .
}
the result is
-------------------
| first | last |
===================
| "Dick" | "Mutt" |
-------------------
Can anyone explain what cause this difference? I thought the period in SPARQL query is the same as "and" operator.
OPTIONAL is a binary operator that combines two graph patterns. The optional pattern is any group pattern and may involve any SPARQL pattern types. If the group matches, the solution is extended, if not, the original solution is given (q-opt3. rq).
SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. Implementations for multiple programming languages exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer.
SPARQL sees your data as a directed, labeled graph, that is internally expressed as triples consisting of subject, predicate and object. Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard).
The ordering is important here
The semantics of SPARQL queries are expressed via the SPARQL algebra and the two queries here produce very different algebra. I use the SPARQL Query Validator provided by the Apache Jena project (disclaimer - I am a committer on that project) to generate the algebra.
Your first query produces the following algebra:
(base <http://example/base/>
(prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
(project (?first ?last)
(leftjoin
(leftjoin
(bgp (triple ?s ab:lastName ?last))
(bgp (triple ?s ab:nick ?first)))
(bgp (triple ?s ab:firstName ?first))))))
And your second query produces the following algebra:
(base <http://example/base/>
(prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
(project (?first ?last)
(join
(leftjoin
(leftjoin
(table unit)
(bgp (triple ?s ab:nick ?first)))
(bgp (triple ?s ab:firstName ?first)))
(bgp (triple ?s ab:lastName ?last))))))
As you can see the triple patterns in your query appear in different order and the operators differ. Importantly your second query has a join
which only preserves compatible solutions from both sides whereas the first query uses only leftjoin
which preserves LHS solutions as-is if there are no compatible solutions.
So in the first query you first find things with a ab:lastName
and then optionally add the ab:nick
or ab:firstName
if present hence you get all the people in your data returned.
In the second query you first find things with a ab:nick
and then optionally add things with a ab:firstName
before requiring that everything has a ab:lastName
. Therefore you can only get the person with a last name returned.
I thought the period in SPARQL query is the same as "and" operator.
No it merely terminates a triple pattern and may optionally follow other clauses (but is not required to do so), it is not an "and" operator.
Adjacent basic graph patterns are joined unless an alternative join operator (e.g. leftjoin
or minus
) is implied by the presence of an OPTIONAL
or MINUS
clause
table unit
?table unit
is a special operator that corresponds to the empty graph pattern in a SPARQL query.
For example SELECT * WHERE { }
would produce the algebra (table unit)
It produces a single empty row which in the semantics of SPARQL means it can be joined to anything and returns the other thing so in essence it acts like a join identity. In many cases a SPARQL engine can simplify the algebra to remove table unit
since in most cases it has no effect on the semantics of the query.
In your first query there is technically another join
between table unit
and the join
operator but in the case of a normal join the presence of table unit
will have no effect (as it's the join identity) and so it can and is simplified out.
However with an OPTIONAL
the SPARQL specification requires that the algebra produced is a left join of the thing inside the clause with whatever the preceding clause was. In the case of your second query there is no preceding clause before your first OPTIONAL
(technically there is an implicit empty graph pattern there) so the first leftjoin
generated has table unit
on its left hand side. Unlike a normal join
the table unit
has to be preserved in this case because the semantics of leftjoin
say that the results from the LHS are preserved if there are no compatible solutions form the RHS.
We can illustrate this with a more trivial query:
SELECT *
WHERE
{
OPTIONAL { ?s a ?type }
}
Produces the algebra:
(base <http://example/base/>
(leftjoin
(table unit)
(bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type))))
This question is old, but the answer is still hard to understand clearly. Allow me to try in natural English with thanks to SPARQL_Order_Matters
When OPTIONALS appear at the beginning of a query, they either
When OPTIONALS appear after some statement has already matched some data, they either
So the real non-obvious behavior happens when an OPTIONAL is first, and it matches some triples. Now all query results match the contents of that OPTIONAL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With