Suppose that I have the following Turtle declaration:
@prefix : <http://example.org#> . :ls :list (:a :b :c)
Is there a way to get the positions of the elements in the collection?
For example, with this query:
PREFIX : <http://example.org#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?elem WHERE { ?x :list ?ls . ?ls rdf:rest*/rdf:first ?elem . }
I get:
-------- | elem | ======== | :a | | :b | | :c | --------
But I would like a query to obtain:
-------------- | elem | pos | ============== | :a | 0 | | :b | 1 | | :c | 2 | --------------
Is it possible?
SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.
SPARQL FILTER functions like regex can test RDF literals. regex matches only plain literals with no language tag. regex can be used to match the lexical forms of other literals by using the str function.
It's a SPARQL 1.1 property path which describes a route through a graph between two graph nodes, in your case it denotes the inverse path, i.e. from object to subject, thus, it's equivalent to. dbpedia:Stephen_King a ? subtype . with a being just a shortcut for rdf:type.
I have found a way to do it using the property function library in ARQ. As Steve Harris says, this is non-standard.
PREFIX : <http://example.org#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX list: <http://jena.hpl.hp.com/ARQ/list#> SELECT ?elem ?pos WHERE { ?x :list ?ls . ?ls list:index (?pos ?elem). }
I've extended the data to make the problem a little harder. Let's add a duplicate element to the list, e.g., an additional :a
at the end:
@prefix : <http://example.org#> . :ls :list (:a :b :c :a) .
Then we can use a query like this to extract each list node (and its element) along with the position of the node in the list. The idea is that we can match all the individual nodes in the list with a pattern like [] :list/rdf:rest* ?node
. The position of each node, though, is the number of intermediate nodes between the head of the list and ?node
. We can match each of those intermediate nodes by breaking the pattern down into
[] :list/rdf:rest* ?mid . ?mid rdf:rest* :node .
Then if we group by ?node
, the number of distinct ?mid
bindings is the position of ?node
in the list. Thus we can use the following query (which also grabs the element (the rdf:first
) associated with each node) to get the positions of elements in the list:
prefix : <https://stackoverflow.com/q/17523804/1281433/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select ?element (count(?mid)-1 as ?position) where { [] :list/rdf:rest* ?mid . ?mid rdf:rest* ?node . ?node rdf:first ?element . } group by ?node ?element
---------------------- | element | position | ====================== | :a | 0 | | :b | 1 | | :c | 2 | | :a | 3 | ----------------------
This works because the structure of an RDF list is a linked list like this (where ?head
is the beginning of the list (the object of :list
), and is another binding of ?mid
because of the pattern [] :list/rdf:rest* ?mid
):
The asker of the question also posted an answer that uses Jena's ARQ extensions for working with RDF lists. The solution posted in that answer is
PREFIX : <http://example.org#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX list: <http://jena.hpl.hp.com/ARQ/list#> SELECT ?elem ?pos WHERE { ?x :list ?ls . ?ls list:index (?pos ?elem). }
This answer depends on using Jena's ARQ and enabling the extensions, but it is more concise and transparent. What isn't obvious is whether one has an obviously preferable performance. As it turns out, for small lists, the difference isn't particularly significant, but for larger lists, the ARQ extensions have much better performance. The runtime for the pure SPARQL query quickly becomes prohibitively long, while there's almost no difference in the version using the ARQ extensions.
------------------------------------------- | num elements | pure SPARQL | list:index | =========================================== | 50 | 1.1s | 0.8s | | 100 | 1.5s | 0.8s | | 150 | 2.5s | 0.8s | | 200 | 4.8s | 0.8s | | 250 | 9.7s | 0.8s | -------------------------------------------
These specific values will obviously differ depending on your setup, but the general trend should be observable anywhere. Since things could change in the future, here's the particular version of ARQ I'm using:
$ arq --version Jena: VERSION: 2.10.0 Jena: BUILD_DATE: 2013-02-20T12:04:26+0000 ARQ: VERSION: 2.10.0 ARQ: BUILD_DATE: 2013-02-20T12:04:26+0000
As such, if I knew that I had to process lists of non-trivial sizes and that I had ARQ available, I'd use the extension.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With