Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import RDF data in SQL?

Tags:

rdf

sparql

I am quite comfortable using SQL but having an impossible time understanding SPARQL. For starters, I don't even understand how to look at the structure of the data (in MySQL I would just do describe <table name>) so I can query the appropriate fields.

Is there a way for me to import an entire RDF dataset into respective tables in a MySQL database?

Barring that, is there a way to SELECT * from all the tables (or whatever the equivalent descriptor is) such that I can get all the output data into csv (and take it from there?)

The RDF dataset I am trying to query has a SPARQL endpoint and even a guide on How to SPARQL but I am having a hard time understanding it.

For example:

PREFIX meannot: <http://rdf.myexperiment.org/ontologies/annotations/>
PREFIX sioc: <http://rdfs.org/sioc/ns#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX mebase: <http://rdf.myexperiment.org/ontologies/base/>
SELECT DISTINCT ?annotator_name
WHERE {
  ?comment mebase:annotates <http://www.myexperiment.org/workflows/52> .
  ?comment rdf:type meannot:Comment .
  ?comment mebase:has-annotator ?annotator
  ?annotator sioc:name ?annotator_name
}

makes little sense to me. Why is there a period at the end of some of the WHERE statements but not others? and what does ?comment mebase:has-annotator ?annotator mean in plain English? Select the annotators name where annotators name is the annotators name? huh?

I would be grateful for any resources that you could point me to.

like image 660
Maiasaura Avatar asked Jun 30 '11 22:06

Maiasaura


2 Answers

Although SPARQL looks SQL like in its syntax how it functions is actually quite different which is the problem you and many others have when trying to learn it.

Pattern Matching

SPARQL is about triple pattern matching rather than selecting from tables like SQL. Each set of three items in your example represents a triple pattern. So for example:

?comment rdf:type meannot:Comment .

This tells the SPARQL processor to find any thing which has rdf:type of meannot:Comment i.e. things which are of type comment. In this pattern ?comment is a variable which acts like a wildcard, think of this as a field in SQL that you can select

If we add in additional triple pattern that uses a variable then we are asking the SPARQL processor to find all things which match all triple patterns, so:

?comment mebase:annotates <http://www.myexperiment.org/workflows/52> .
?comment rdf:type meannot:Comment .

This finds things which are comments on a specific item.
In SQL terms this would be like writing SELECT commentID FROM COMMENTS WHERE itemID=1234 if that helps you understand it.

As we start adding in additional variables you can think of that as doing joins with other tables:

?comment mebase:annotates <http://www.myexperiment.org/workflows/52> .
?comment rdf:type meannot:Comment .
?comment mebase:has-annotator ?annotator .

This finds things which are comments and the users that made them on a specific item
It would be roughly equivalent to SELECT commentID, userID FROM COMMENTS C INNER JOIN USERS U ON C.userID=U.userID WHERE itemID=1234 in SQL

Syntax Notes

As far as the syntax goes the . denotes the end of a triple pattern.
The fact that it is omitted in your example is actually an error on the part of the people publishing that how to guide. I happen to work in one of the universities who are involved in the project so I have dropped a colleague a note asking them to fix this.

What you may also see in examples is the use of ; at the end of a triple pattern. These are shorthands for repeating the subject e.g.

?comment mebase:annotates <http://www.myexperiment.org/workflows/52> ;
         rdf:type meannot:Comment .

Means that you don't have to type out ?comment again for the subsequent pattern.

Similarily , is used to repeat the subject and the predicate:

?comment rdf:type meannot:Comment , ex:Annotation .

Would mean that ?comment and rdf:type are repeated, in plain english the above would be things which are of type comment and of type annotation

Discovering the data structure

RDF is not stored in tables since it is a schemaless data model, the closest thing to tables are named graphs which are just a way to logically group sets of triples together.

Take a look at this question on exploratory SPARQL queries for some suggestions on queries to try.

If you just want to select everything you can do SELECT * WHERE { ?s ?p ?o } - beware that many endpoints will impose a limit on the number of results for one query so even if the endpoint has millions of triples behind it you may get only a few thousand back. You can page through results using LIMIT and OFFSET e.g.

SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 0
SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 1000
SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 2000
# And so forth until you find no further results

If you just want to get all the data to trawl through try looking around on a site to see if they offer an RDF dump which will typically be a zipped archive with a bunch of RDF files in it. This will let you look at the data locally

Putting RDF into SQL tables

There are systems that will let you store RDF in SQL based databases but take it from someone who's worked with a large variety of triple stores this is nowhere near as performant as using a native triple store.

You may be interested in R2RML which is a new W3C standard (currently in early working draft) which defines a standard way to map relational data to RDF. Some of their documentation may help you better understand the relationship between RDF/SPARQL and SQL

Tutorials

For a fuller tutorial I'd check out SPARQL by Example which is by one of the authors of the SPARQL specification and is highly recommended

like image 83
RobV Avatar answered Nov 18 '22 05:11

RobV


You can use RDF2X to convert big RDF dumps to MySQL, PostgreSQL or other relational database. A simple alternative for smaller datasets is rdf2rdb.

like image 44
David Příhoda Avatar answered Nov 18 '22 04:11

David Příhoda