Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparql query running forever

I'm struggling with the execution of a SPARQL query in Jena, with a resulting behaviour that I don't understand...

I'm trying to query the Esco ontology (https://ec.europa.eu/esco/download), and I'm using TDB to load the ontology and create the model (sorry if the terms I use are not accurate, I'm not very experienced).

My goal is to find a job position uri in the ontology that matches with the text I have previously extracted: ex: extracted term : "acuponcteur" -> label in ontology: "Acuponcteur"@fr -> uri: <http://ec.europa.eu/esco/occupation/14918>

What I call the "weird behaviour" is related to the results I'm getting (or not) when excuting queries, ie.:

When executing the following query :

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX esco: <http://ec.europa.eu/esco/model#>      
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>   
SELECT ?position    
WHERE {     
    ?s rdf:type esco:Occupation. 
    { ?position skos:prefLabel ?label. } 
    UNION 
    { ?position skos:altLabel ?label. } 
    FILTER (lcase(?label)= \"acuponcteur\"@fr ) 
}
LIMIT 10 

I get those results after 1 minute :

-----------------------------------------------
| position                                    |
===============================================
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
-----------------------------------------------

However, when I'm trying to add the DISTINCT keyword, thus :

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX esco: <http://ec.europa.eu/esco/model#>      
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>   
SELECT DISTINCT ?position   
WHERE {     
    ?s rdf:type esco:Occupation. 
    { ?position skos:prefLabel ?label. } 
    UNION 
    { ?position skos:altLabel ?label. } 
    FILTER (lcase(?label)= \"acuponcteur\"@fr ) 
}
LIMIT 10 

it seems like the query keeps running forever (i stopped the execution after 20 minutes waiting...)

I get the same behaviour when executing the same query as the first one (thus without DISTINCT), with another label to match, a label that I'm sure is not in the ontology. While expecting empty result, it (seems like it) keeps running and i have to kill it after a while (once again, i waited 20 minutes to the most) :

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX esco: <http://ec.europa.eu/esco/model#>      
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>   
SELECT ?position    
WHERE {     
    ?s rdf:type esco:Occupation. 
    { ?position skos:prefLabel ?label. } 
    UNION 
    { ?position skos:altLabel ?label. } 
    FILTER (lcase(?label)= \"assistante scolaire\"@fr ) 
}
LIMIT 10 

May it be a problem in the code I'm running? There it is:

public static void main(String[] args) {

    // Make a TDB-backed dataset
    String directory = "data/testtdb" ;
    Dataset dataset = TDBFactory.createDataset(directory) ;

    // transaction (protects a TDB dataset against data corruption, unexpected process termination and system crashes)
    dataset.begin( ReadWrite.WRITE );
    // assume we want the default model, or we could get a named model here
    Model model = dataset.getDefaultModel();

    try {

          // read the input file - only needs to be done once
          String source = "data/esco.rdf";
          FileManager.get().readModel(model, source, "RDF/XML-ABBREV");

          // run a query

          String queryString =
                    "PREFIX skos: <http://www.w3.org/2004/02/skos/core#> " +
                    "PREFIX esco: <http://ec.europa.eu/esco/model#> " +     
                    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +  
                    "SELECT ?position " +   
                    "WHERE { "  +   
                    "   ?s rdf:type esco:Occupation. " +
                    "   { ?position skos:prefLabel ?label. } " +
                    "   UNION " +
                    "   { ?position skos:altLabel ?label. }" +
                    "   FILTER (lcase(?label)= \"acuponcteur\"@fr ) " +
                    "}" +
                    "LIMIT 1 "  ;

          Query query = QueryFactory.create(queryString) ;

          // execute the query
          QueryExecution qexec = QueryExecutionFactory.create(query, model) ;
          try {
              ResultSet results = qexec.execSelect() ;
              // taken from apache Jena tutorial 
              ResultSetFormatter.out(System.out, results, query) ;

          } finally { 
              qexec.close() ; 
          }

      } finally {
          model.close() ;
          dataset.end();
      }

}

What am I doing wrong here? Any idea?

Thanks!

like image 975
CecileR Avatar asked Aug 14 '14 09:08

CecileR


People also ask

What is construct query in SPARQL?

The CONSTRUCT query form returns an RDF graph. The graph is built based on a template which is used to generate RDF triples based on the results of matching the graph pattern of the query.

Is SPARQL null?

NULLs or unbound variables The term “NULL” actually does not occur in the SPARQL spec, perhaps because it doesn't really have a good rep in the database world. Instead the spec talks about bound and unbound variables.

What is optional SPARQL?

OPTIONAL is a binary operator that combines two graph patterns. The optional pattern is any group pattern and may involve any SPARQL pattern types. If the group matches, the solution is extended, if not, the original solution is given (q-opt3. rq).


1 Answers

As a first point that may or may not make much difference, you can use a property path to simplify

{ ?position skos:prefLabel ?label. } 
UNION 
{ ?position skos:altLabel ?label. } 

as

?position skos:prefLabel|skos:altLabel ?label 

This makes the query:

SELECT ?position    
WHERE {     
    ?s rdf:type esco:Occupation.                   # (1)
    ?position skos:prefLabel|skos:altLabel ?label  # (2)
    FILTER (lcase(?label)="acuponcteur"@fr ) 
}

What's the point of ?s in this query? There are some number n of ?position/?label pairs that match (2), and some number m values of ?s that match (1). The number of results that you get from the query is m×n, but you never use the value of ?s. It looks like you used DISTINCT to get rid of some repeated values, but you didn't look to see why you were getting repeated values in the first place. You should simply remove the useless line (1), and have the query:

SELECT DISTINCT ?position    
WHERE {     
    ?position skos:prefLabel|skos:altLabel ?label
    FILTER (lcase(?label)="acuponcteur"@fr ) 
}

I wouldn't be surprised if, at the point, you don't even need the DISTINCT anymore.

like image 81
Joshua Taylor Avatar answered Oct 28 '22 02:10

Joshua Taylor