Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functions to manipulate RDF collections in SPARQL

Tags:

rdf

sparql

I would like to know if there are some functions to manipulate RDF Collections in SPARQL.

A motivating problem is the following.

Suppose you have:

@prefix : <http://example.org#> .
:x1 :value 3 .
:x2 :value 5 .
:x3 :value 6 .
:x4 :value 8 .

:list :values (:x1 :x2 :x3 :x4) .

And you want to calculate the following formula: ((Xn - Xn-1) + ... (X2 - X1)) / (N - 1)

Is there some general way to calculate it?

Up until now, I was only able to calculate it for a fixed set of values. For example, for 4 values, I can use the following query:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?r { 
 ?list :values ?ls .
 ?ls rdf:first ?x1 .
 ?ls rdf:rest/rdf:first ?x2 .
 ?ls rdf:rest/rdf:rest/rdf:first ?x3 .
 ?ls rdf:rest/rdf:rest/rdf:rest/rdf:first ?x4 .
 ?x1 :value ?v1 .
 ?x2 :value ?v2 .
 ?x3 :value ?v3 .
 ?x4 :value ?v4 .
 BIND ( ((?v4 - ?v3) + (?v3 - ?v2) + (?v2 - ?v1)) / 3 as ?r)
}

What I would like is some way to access the Nth value and to define some kind of recursive function to calculate that expression. I think it is not possible, but maybe, someone has a nice solution.

like image 896
Labra Avatar asked Mar 24 '23 10:03

Labra


1 Answers

No built-ins that make formulas easier…

SPARQL does include some mathematical functions for arithmetic and aggregate computations. However, I don't know of any particularly convenient ways of concisely representing mathematical expressions in SPARQL. I've been looking at a paper lately that discusses an ontology for representing mathematical objects like expressions and definitions. They implemented a system to evalute these, but I don't think it used SPARQL (or at least, it wasn't just a simple extension of SPARQL).

Wenzel, Ken, and Heiner Reinhardt. "Mathematical Computations for Linked Data Applications with OpenMath." Joint Proceedings of the 24th Workshop on OpenMath and the 7th Workshop on Mathematical User Interfaces (MathUI). 2012.

…but we can still do this case.

That said, this particular case isn't too hard to do, since it's not too hard to work with RDF lists in SPARQL, and SPARQL includes the mathematical functions needed for this expression. First, a bit about RDF list representation, that will make the solution easier to understand. (If you're already familiar with this, you can skip the next paragraph or two.)

RDF lists are linked lists, and each list is related to it's first element by the rdf:first property, and to the rest of the list by rdf:rest. So the convenient notation (:x1 :x2 :x3 :x4) is actually shorthand for:

_:l1 rdf:first :x1 ; rdf:rest _:l2 .
_:l2 rdf:first :x2 ; rdf:rest _:l3 .
_:l3 rdf:first :x3 ; rdf:rest _:l4 .
_:l3 rdf:first :x4 ; rdf:rest rdf:nil .

Representing blank nodes with [], we can make this a bit clearer:

[ rdf:first :x1 ;
  rdf:rest [ rdf:first :x2 ;
             rdf:rest [ rdf:first :x3 ;
                        rdf:rest [ rdf:first :x4 ;
                                   rdf:rest rdf:nil ]]]]

Once the head of the list has been identified, that is, the element with rdf:first :x1, then any list l reachable from it by an even number repetitions (including 0) of rdf:rest/rdf:rest is a list whose rdf:first is an odd numbered element of the list (since you started indexing at 1). Starting at l and going forward one rdf:rest, we're at an l' whose rdf:first is an even numbered element of the list.

Since SPARQL 1.1 property paths let us write (rdf:rest/rdf:rest)* to denote any even numbered repetitions of rdf:rest, we can write up the following query that binds the :value of odd numbered elements of ?n and the value of the following even numbered elements to ?nPlusOne. The math in the SELECT form is straightforward, although to get N-1, we actually use 2*COUNT(*)-1, because the number of rows (each of which binds elements n and n+1) is N/2.

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/(2*COUNT(*)-1) as ?result) {
 ?list :values [ (rdf:rest/rdf:rest)* [ rdf:first [ :value ?n ] ; 
                                        rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results (using Jena's command line ARQ):

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.333333333333333333333333 |
------------------------------

which is what is expected since

 (5 - 3) + (8 - 6)     2 + 2     4      _ 
------------------- = ------- = --- = 1.3
      (4 - 1)            3       3

Update

I just realized that what is implemented above was based on my comment on the question about whether the summation was correct, because it simplified very easily. That is, the above implements

(x2 - x1) + (x4 - x3) + ... + (xN - xN-1) / (N - 1)

whereas the original question asked for

(x2 - x1) + (x3 - x2) + … + (xN-1 - xN-2) + (xN - xN-1) / (N - 1)

The original is even simpler, since the pairs are identified by each rdf:rest of the original list, not just even numbers of repetitions. Using the same approach as above, this query can be represented by:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/COUNT(*) as ?result) {
 ?list :values [ rdf:rest* [ rdf:first [ :value ?n ] ; 
                             rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------

Of course, since the expression can be simplified to

xN - x1 / (N - 1)

we can also just use a query which binds ?x1 to the first element of the list, ?xn to the last element, and ?xi to each element of the list (so that COUNT(?xi) (and also COUNT(*)) is the number of items in the list):

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT (((?xn-?x1)/(COUNT(?xi)-1)) as ?result) WHERE {
 ?list :values [ rdf:rest*/rdf:first [ :value ?xi ] ;
                 rdf:first [ :value ?x1 ] ;
                 rdf:rest* [ rdf:first [ :value ?xn ] ; 
                             rdf:rest  rdf:nil ]] .
}
GROUP BY ?x1 ?xn

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------
like image 77
Joshua Taylor Avatar answered Apr 01 '23 16:04

Joshua Taylor