Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparql queries over collection and rdf:containers?

Hi all rdf/sparql developpers. Here a question that have been nagging me for a while now but it seems nobody has answered it accurately since the rdf and sparql specifications have been released.

To state the case, RDF defines several ways to deal with multi-valued properties for resources; from creating as many triples with same subjet-predicate uris to collections or containers. That's all good since each pattern has its own characteristics.

But seen from the SPARQL point-of-view, it seems to me that querying those structures leads to overly complicated queries that (that's worse) are unable to transcribe into a sensible resultset: you cannot use variables to query arbitrary-length and propertyPath does not preserve "natural" order.

In a naïve way, in many SELECT or ASK queries, if I want to query or filter on the container's or list's values, I won't most of the time care what the underlying pattern really is (if any). So for instance:

<rdf:Description rdf:about="urn:1">
    <rdfs:label>
        <rdf:Alt>
            <rdf:li xml:lang="fr">Exemple n°1</rdf:li>
            <rdf:li xml:lang="en">Example #1</rdf:li>
        </rdf:Alt>
    </rdfs:label>
    <my:release>
        <rdf:Seq>
            <rdf:li>10.0</rdf:li>
            <rdf:li>2.4</rdf:li>
            <rdf:li>1.1.2</rdf:li>
            <rdf:li>0.9</rdf:li>
        </rdf:Seq>
    </my:release>
</rdf:Description>

<rdf:Description rdf:about="urn:2">
    <rdfs:label xml:lang="en">Example #2</rdfs:label>
</rdf:Description>

Obviously I would expect both resource to answer the query:

SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'@en) }

I would also expect the query :

SELECT ?ver WHERE { <urn:1> my:release ?ver }

to return the rdf:Seq elements (or any rdf:Alt's for that matter) in original order (for the other patterns, it wouldn't matter if original order is preserved or not so why not keep it anyway ?) - unless explicitely specified through an ORDER BY clause.

Of course, it would be necessary to preserve compatibility with the old way, so perhaps a possibility would be to extend the propertyPath syntax with a new operator?

I feel it would simplify a lot the day-to-day SPARQL use-case.

Does it make sense to you? Moreover, do you see any reason why not to try implementing this?

EDIT corrected the example's urn:2 rdfs:label value that was incorrect

like image 272
Max Avatar asked Apr 25 '13 19:04

Max


2 Answers

I realize that this question already has an answer, but it's worth taking a look at what you can do here if you use RDF lists as opposed to the other types of RDF containers. First, the data that you've provided (after providing namespace declarations) in Turtle is:

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix my:    <https://stackoverflow.com/q/16223095/1281433/> .

<urn:2>  rdfs:label  "Example #2"@en .

<urn:1>  rdfs:label  [ a       rdf:Alt ;
                       rdf:_1  "Exemple n°1"@fr ;
                       rdf:_2  "Example #1"@en
                     ] ;
        my:release  [ a       rdf:Seq ;
                      rdf:_1  "10.0" ;
                      rdf:_2  "2.4" ;
                      rdf:_3  "1.1.2" ;
                      rdf:_4  "0.9"
                    ] .

The properties rdf:_n are the difficulty here, since they are the only thing that provides any real order to the elements in the sequence. (The alt doesn't really have an important sequence, although it still uses rdf:_n properties.) You can get all three labels if you use a SPARQL property path that makes the rdf:_n property optional:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label where {
  ?x rdfs:label/(rdf:_1|rdf:_2|rdf:_3)* ?label
  filter( isLiteral( ?label ))
}
------------------------------
| x       | label            |
==============================
| <urn:1> | "Exemple n°1"@fr |
| <urn:1> | "Example #1"@en  |
| <urn:2> | "Example #2"@en  |
------------------------------

Let's look at what you can do with RDF lists instead. If you use lists, then you data is this:

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix my:    <https://stackoverflow.com/q/16223095/1281433/> .

<urn:2>  rdfs:label  "Example #2"@en .

<urn:1>  rdfs:label  ( "Exemple n°1"@fr "Example #1"@en ) ;
        my:release  ( "10.0" "2.4" "1.1.2" "0.9" ) .

Now you can get the labels relatively easily:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label where {
  ?x rdfs:label/(rdf:rest*/rdf:first)* ?label
  filter( isLiteral( ?label ))
}
------------------------------
| x       | label            |
==============================
| <urn:1> | "Exemple n°1"@fr |
| <urn:1> | "Example #1"@en  |
| <urn:2> | "Example #2"@en  |
------------------------------

If you want the position of the labels in the list of labels, you can even get that, although it makes the query a bit more complicated:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label (count(?mid)-1 as ?position) where {
  ?x rdfs:label ?y .
  ?y rdf:rest* ?mid . ?mid rdf:rest*/rdf:first? ?label .
  filter(isLiteral(?label))
}
group by ?x ?label
-----------------------------------------
| x       | label            | position |
=========================================
| <urn:1> | "Exemple n°1"@fr | 0        |
| <urn:1> | "Example #1"@en  | 1        |
| <urn:2> | "Example #2"@en  | 0        |
-----------------------------------------

This uses the technique in Is it possible to get the position of an element in an RDF Collection in SPARQL? to compute the position of each value in the list that is the object of rdfs:label, starting from 0, and assigning 0 to elements that aren't in a list.

like image 196
Joshua Taylor Avatar answered Nov 17 '22 09:11

Joshua Taylor


RDF defines a vocabulary for collections and containers but they hold no special meaning in terms of how graphs containing them should be interpreted. They aren't intended for and aren't really appropriate for representing multi-valued properties.

In general, saying:

:A :predicate [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .

Is not equivalent to

:A :predicate :B , :C .

Let's say the predicate is owl:sameAs:

:A owl:sameAs [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .

The above says that :A names an individual containing :B and :C, whereas:

:A owl:sameAs :B , :C .

says that :A, :B, and :C are the same individual.

SPARQL is agnostic about containers and collections (aside from the syntactic shorthand for rdf:List). If you want a more convenient way of working with collections, many RDF APIs including Jena and rdflib have first-class representations for them.

Addendum

The way to model multi-valued properties--that is, to model that both "Example n°1"@fr and and "Example #1"@en are labels for urn:1--is to simply state the two facts:

<rdf:Description rdf:about="urn:1">
    <rdfs:label xml:lang="fr">Exemple n°1</rdfs:label>
    <rdfs:label xml:lang="en">Example #1</rdfs:label>
    ...
</rdf:Description>

And the query:

SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'@en) ) }

will match on the English labels for <urn:1> and <urn:2>.

For the my:release property where you have a multi-valued property and an ordering on its values, it's a little trickier. You could define a new property (e.g) my:releases whose value is an rdf:List or rdf:Seq. my:release gives the direct relationship and my:releases an indirect relationship specifying an explicit ordering. With an inferencing store and the appropriate rule, you would only have to provide the latter. Unfortunately this doesn't make it any easier to use the ordering within SPARQL.

An approach that's easier to work with in SPARQL and non-inferencing stores would be to make the versions themselves objects with properties that define the ordering:

  <rdf:Description rdf:about="urn:1">
    <rdfs:label xml:lang="fr">Exemple n&#xB0;1</rdfs:label>
    <rdfs:label xml:lang="en">Example #1</rdfs:label>
    <my:release>
      <my:Release>
        <dc:issued rdf:datatype="&xsd;date">2008-10-10/dc:issued>
        <my:version>10.0</my:version>
      </my:Release>
    </my:release>
    <my:release>
      <my:Release>
        <my:version>2.4</my:version>
        <dc:issued rdf:datatype="&xsd;date">2007-05-01</dc:issued>
      </my:Release>
    </my:release>
    ...
  </rdf:Description>

In the above, the date can be used to order the results as there is no explicit sequence anymore. The query is only slightly more complex:

SELECT ?ver 
WHERE { <urn:1> my:release [ my:version ?ver ; dc:issued ?date ] }
ORDER BY ?date
like image 36
user2313838 Avatar answered Nov 17 '22 10:11

user2313838