Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to optimize SPARQL queries?

I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy.

The triples in the document look like

<sem:triple xmlns:sem="http://marklogic.com/semantics">
    <sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject>
    <sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate>
    <sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object>
</sem:triple>

I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are.

select  ?manager ?leaf (count(?mid) as ?distance) { 
  BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager)
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager .
}
group by ?manager ?leaf 
order by ?manager ?leaf

This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.

like image 715
TJ Tang Avatar asked Jun 20 '16 11:06

TJ Tang


People also ask

Where should I run Sparql query?

Online UI are http://dbpedia.org/snorql/ and http://dbpedia.org/sparql/ , they both accept SparQL queries.

Is SPARQL like SQL?

SPARQL and SQL have very similar UNION and MINUS operators, which respectively add and remove solutions from a solution set. Because the datatypes of an SQL table are assumed to be uniform across all rows, care must be taken to align the datatypes of the SELECT.

What is SPARQL good for?

SPARQL, short for “SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF. The SPARQL standard is designed and endorsed by the W3C and helps users and developers focus on what they would like to know instead of how a database is organized.

What types of queries does SPARQL support?

SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.


1 Answers

I think the biggest problem is going to be the BIND() - MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the ?manager variable to see if that makes a big difference? i.e.:

select  ?leaf (count(?mid) as ?distance) { 
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+
    <http://rdf.abbvienet.com/infrastructure/person/10025613> .
}
group by ?leaf 
order by ?leaf

StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?

like image 111
John Snelson Avatar answered Sep 25 '22 12:09

John Snelson