Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SPARQL to locate a subject with multiple occurrences of same property

Tags:

rdf

sparql

jena

I am trying to use SPARQL to return triples where the same subject has multiple objects for the same property, like so:

example:subject1 example:property example:object1
example:subject1 example:property example:object2

I feel like such a query should make use of property paths:

SELECT ?subject WHERE {
  ?subject example:property{2} ?object .
}

I'm running this property-path query using Jena 2.6.4, but I'm not getting any results. Is this due to Jena? Or am I phrasing the query incorrectly? The following query returns the results I expect, but is inelegant:

SELECT ?subject WHERE {
  ?subject example:property ?object1 .
  ?subject example:property ?object2 .
  FILTER(!(?object1=?object2))
}

The property-path query returns results if I use, say, example:property{1,2} or example:property{1}; just not the results I want. So, I know Jena is interpreting the syntax correctly, but I also know this is an older version of Jena, and so it might not recognize all the features of SPARQL 1.1.

I feel like this is a common kind of query, and should have a more elegant solution (and really, a cookbook solution). Am I right in using property paths to solve this problem, or should I take a different approach? And if I should use property paths, am I using them correctly?

like image 695
Steve McCauley Avatar asked Nov 30 '22 03:11

Steve McCauley


1 Answers

Let's use this data:

@prefix example: <http://example.org/> .
example:subject1 example:property example:object1 .
example:subject1 example:property example:object2 .

Without property paths

A query like this produces ?subjects that have two distinct values for example:property:

prefix example: <http://example.org/>
select ?subject where { 
  ?subject example:property ?object1, ?object2 .
  filter ( ?object1 != ?object2 )
}
--------------------
| subject          |
====================
| example:subject1 |
| example:subject1 |
--------------------

This is pretty much what you've already got, though. To get it down to one result, you can select distinct:

prefix example: <http://example.org/>
select distinct ?subject where { 
  ?subject example:property ?object1, ?object2 .
  filter ( ?object1 != ?object2 )
}
--------------------
| subject          |
====================
| example:subject1 |
--------------------

About property paths

Property paths are a way of expressing chains of properties (forward and backward) without needing to bind all the individual resources along the way, which is especially important if a variable number of edges are to be allowed. You can bind the things at either end of the chain, but not the things in the middle.

The data, graphically, looks like this:

        example:object1 &leftarrow;example:property example:subject &rightarrow;example:property example:object2

If you wanted to select the two objects that are related to some subject, you could use a property path. The path from example:object1 to example:object2 is (^example:property)/example:property, because you follow an example:property edge backward to example:subject, and then follow an example:property edge forward to example:object2. If you wanted the objects, but not the subject, you could use the following query:

prefix example: <http://example.org/>
select * where { 
  ?object1 (^example:property)/example:property ?object2 .
  filter ( ?object1 != ?object2 )
}

I don't think there's a convenient way to get the subject using a property path. You could do something like

?subject property/^property/property/^property ?subject

to go from ?subject to some object, then back to something (i.e., not necessarily ?subject, then out again, and then back to ?subject, but you wouldn't be getting the guarantees that there are two distinct objects anymore.

The path language for SPARQL property paths is described in section 9.1 Property Path Syntax of the SPARQL 1.1 Query Language recommendation (the W3C standard). Notably, it doesn't include the p{n} notation that Section 3 Path Language of the earlier working draft did. This means that your pattern

?subject example:property{2} ?object

isn't actually legal SPARQL (though some implementations might support it). However, according to the working drafts, we can still determine what it means. To match this pattern, you'd need data of the form

        ?subject &rightarrow;example:property [] &rightarrow;example:property ?object

where [] just indicates some arbitrary resource. This isn't the same shape as the data that you've actually got. So even if this syntax were legal in SPARQL 1.1, it wouldn't give you the type of result that you're looking for. In general, the property paths are essentially a type of regular expression for property chains in the data.

Conclusion

While property chains can make some things very nice, and some otherwise impossible things possible (e.g., see my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?), I don't think that they're appropriate for this case. I think that your best bet, and a rather elegant solution, is:

?subject example:property ?object1, ?object2 .
filter( ?object1 != ?object2 ).

because it most plainly captures the intended query, “find ?subjects with two distinct values of example:property.”

like image 59
Joshua Taylor Avatar answered May 10 '23 05:05

Joshua Taylor