Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to skip bad dates in DBpedia SPARQL request?

I need to get data about films from DBpedia.

I use SPARQL query as follows on http://dbpedia-live.openlinksw.com/sparql:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

I tried to get films that were released after 01.01.2000. But the engine answers as follows:

Virtuoso 22007 Error DT006: Cannot convert 2009-06-31 to datetime : 
Too many days (31, the month has only 30)

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

As far as I can understand there are some errors in data in DBpedia and the engine cannot convert string data to date type in order to compare with the date I set. And the engine breaks the query execution.

So, the question is: is there any way to tell the engine to skip all the erroneous data and return to me all that could be processed?

like image 801
Dennis Ivanoff Avatar asked Sep 28 '11 10:09

Dennis Ivanoff


2 Answers

You can use COALESCE function in order to define a default date for invalid ones:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released ?released_fixed WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  bind ( coalesce(xsd:datetime(?released), '1000-01-01') as ?released_fixed)
  FILTER(xsd:date(coalesce(xsd:datetime(?released), '1000-01-01')) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

This query provides following SPARQL Results on DbPedia Live Endpoint

The bind construct is only for presenting the fixed dates which are set to '1000-01-01' and stored in the variable ?release_fixed. The bind is not necessary for the query and can be omitted together with ?release_fixed in the SELECT clause

like image 123
mgraube Avatar answered Nov 19 '22 16:11

mgraube


One way is to filter using the datatype, as you can see below:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(datatype(?released) = <http://www.w3.org/2001/XMLSchema#dateTime>)
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

SPARQL results

like image 1
Franklin Amorim Avatar answered Nov 19 '22 16:11

Franklin Amorim