whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables describe table select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.
Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard). Solutions to the variables are then found by matching the patterns in the query to triples in the dataset.
SQL does this by accessing tables in relational databases, and SPARQL does this by accessing a web of Linked Data. (Of course, SPARQL can be used to access relational data as well, but it was designed to merge disparate sources of data.)
SPARQL is a declarative programming language and protocol for graph database analytics. SPARQL has the capability to perform all the analytics that SQL can perform, plus it can be used for semantic analysis, the examination of relationships.
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class WHERE { ?s a ?class . } LIMIT 25 OFFSET 0
(LIMIT
and OFFSET
are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a
is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type
predicate - this links individual instances to owl:Class
/rdfs:Class
types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property WHERE { ?s ?property ?o . }
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property WHERE { ?s a <http://xmlns.com/foaf/0.1/Person>; ?property ?o . }
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type
of http://xmlns.com/foaf/0.1/Person
.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral
:
SELECT DISTINCT ?property WHERE { ?s a <http://xmlns.com/foaf/0.1/Person>; ?property ?o . FILTER isLiteral(?o) }
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person { int age; Person marriedTo; }
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o
values are members of).
SELECT DISTINCT ?property, ?class WHERE { ?s a <http://xmlns.com/foaf/0.1/Person>; ?property ?o . ?o a ?class . FILTER(!isLiteral(?o)) }
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With