Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SPARQL Optional query

Tags:

sparql

I have RDF in turtle format in this following

    @prefix ab: <http://learningsparql.com/ns/addressbook#> .
    @prefix d: <http://learningsparql.com/ns/data#> .
    d:i0432 ab:firstName "Richard" .
    d:i0432 ab:lastName "Mutt" .
    d:i0432 ab:homeTel "(229) 276-5135" .
    d:i0432 ab:nick "Dick" .
    d:i0432 ab:email "[email protected]" .
    d:i9771 ab:firstName "Cindy" .
    d:i9771 ab:lastName "Marshall" .
    d:i9771 ab:homeTel "(245) 646-5488" .
    d:i9771 ab:email "[email protected]" .
    d:i8301 ab:firstName "Craig" .
    d:i8301 ab:lastName "Ellis" .
    d:i8301 ab:workTel "(245) 315-5486" .
    d:i8301 ab:email "[email protected]" .
    d:i8301 ab:email "[email protected]" .

and the query is

    PREFIX ab: <http://learningsparql.com/ns/addressbook#>
    SELECT ?first ?last
    WHERE
    {
        ?s ab:lastName ?last .
        OPTIONAL {?s ab:nick ?first. }.
        OPTIONAL {?s ab:firstName ?first .}.
    }

the result is

    ------------------------
    | first   | last       |
    ========================
    | "Craig" | "Ellis"    |
    | "Cindy" | "Marshall" |
    | "Dick"  | "Mutt"     |
    ------------------------

but if i change the query to

    PREFIX ab: <http://learningsparql.com/ns/addressbook#>
    SELECT ?first ?last
    WHERE
    {
        OPTIONAL {?s ab:nick ?first. }.
        OPTIONAL {?s ab:firstName ?first .}.
        ?s ab:lastName ?last .
    }

the result is

    -------------------
    | first  | last   |
    ===================
    | "Dick" | "Mutt" |
    -------------------

Can anyone explain what cause this difference? I thought the period in SPARQL query is the same as "and" operator.

like image 603
Willy Avatar asked Aug 05 '14 04:08

Willy


People also ask

What is optional SPARQL?

OPTIONAL is a binary operator that combines two graph patterns. The optional pattern is any group pattern and may involve any SPARQL pattern types. If the group matches, the solution is extended, if not, the original solution is given (q-opt3. rq).

What types of queries does SPARQL support?

SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. Implementations for multiple programming languages exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer.

How does SPARQL query work?

SPARQL sees your data as a directed, labeled graph, that is internally expressed as triples consisting of subject, predicate and object. Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard).


2 Answers

The ordering is important here

The semantics of SPARQL queries are expressed via the SPARQL algebra and the two queries here produce very different algebra. I use the SPARQL Query Validator provided by the Apache Jena project (disclaimer - I am a committer on that project) to generate the algebra.

Your first query produces the following algebra:

(base <http://example/base/>
  (prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
    (project (?first ?last)
      (leftjoin
        (leftjoin
          (bgp (triple ?s ab:lastName ?last))
          (bgp (triple ?s ab:nick ?first)))
        (bgp (triple ?s ab:firstName ?first))))))

And your second query produces the following algebra:

(base <http://example/base/>
  (prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
    (project (?first ?last)
      (join
        (leftjoin
          (leftjoin
            (table unit)
            (bgp (triple ?s ab:nick ?first)))
          (bgp (triple ?s ab:firstName ?first)))
        (bgp (triple ?s ab:lastName ?last))))))

As you can see the triple patterns in your query appear in different order and the operators differ. Importantly your second query has a join which only preserves compatible solutions from both sides whereas the first query uses only leftjoin which preserves LHS solutions as-is if there are no compatible solutions.

So in the first query you first find things with a ab:lastName and then optionally add the ab:nick or ab:firstName if present hence you get all the people in your data returned.

In the second query you first find things with a ab:nick and then optionally add things with a ab:firstName before requiring that everything has a ab:lastName. Therefore you can only get the person with a last name returned.

I thought the period in SPARQL query is the same as "and" operator.

No it merely terminates a triple pattern and may optionally follow other clauses (but is not required to do so), it is not an "and" operator.

Adjacent basic graph patterns are joined unless an alternative join operator (e.g. leftjoin or minus) is implied by the presence of an OPTIONAL or MINUS clause

Edit - What is table unit?

table unit is a special operator that corresponds to the empty graph pattern in a SPARQL query.

For example SELECT * WHERE { } would produce the algebra (table unit)

It produces a single empty row which in the semantics of SPARQL means it can be joined to anything and returns the other thing so in essence it acts like a join identity. In many cases a SPARQL engine can simplify the algebra to remove table unit since in most cases it has no effect on the semantics of the query.

In your first query there is technically another join between table unit and the join operator but in the case of a normal join the presence of table unit will have no effect (as it's the join identity) and so it can and is simplified out.

However with an OPTIONAL the SPARQL specification requires that the algebra produced is a left join of the thing inside the clause with whatever the preceding clause was. In the case of your second query there is no preceding clause before your first OPTIONAL (technically there is an implicit empty graph pattern there) so the first leftjoin generated has table unit on its left hand side. Unlike a normal join the table unit has to be preserved in this case because the semantics of leftjoin say that the results from the LHS are preserved if there are no compatible solutions form the RHS.

We can illustrate this with a more trivial query:

SELECT *
WHERE
{
  OPTIONAL { ?s a ?type }
}

Produces the algebra:

(base <http://example/base/>
  (leftjoin
    (table unit)
    (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type))))
like image 74
RobV Avatar answered Dec 19 '22 21:12

RobV


This question is old, but the answer is still hard to understand clearly. Allow me to try in natural English with thanks to SPARQL_Order_Matters

When OPTIONALS appear at the beginning of a query, they either

  • Don't match, and nothing happens
  • Do match, and now this is the starting dataset against which the rest of the query must match

When OPTIONALS appear after some statement has already matched some data, they either

  • Don't match, and nothing happens
  • Do match, and some new triples are added to the results

So the real non-obvious behavior happens when an OPTIONAL is first, and it matches some triples. Now all query results match the contents of that OPTIONAL.

like image 43
Paul Cuddihy Avatar answered Dec 19 '22 23:12

Paul Cuddihy