I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy. The triples in the document look like <pre class="prettyprint"><code><sem:triple xmlns:sem="http://marklogic.com/semantics"> <sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject> <sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate> <sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object> </sem:triple> </code></pre> I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are. <pre class="prettyprint"><code>select ?manager ?leaf (count(?mid) as ?distance) { BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager) ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid . ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager . } group by ?manager ?leaf order by ?manager ?leaf </code></pre> This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.

I think the biggest problem is going to be the <code>BIND()</code> - MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the <code>?manager</code> variable to see if that makes a big difference? i.e.: <pre class="prettyprint"><code>select ?leaf (count(?mid) as ?distance) { ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid . ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ <http://rdf.abbvienet.com/infrastructure/person/10025613> . } group by ?leaf order by ?leaf </code></pre> StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?

Is there any way to optimize SPARQL queries?

Tags:

sparql

marklogic

marklogic-8

I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy.

The triples in the document look like

<sem:triple xmlns:sem="http://marklogic.com/semantics">
    <sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject>
    <sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate>
    <sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object>
</sem:triple>

I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are.

select  ?manager ?leaf (count(?mid) as ?distance) { 
  BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager)
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager .
}
group by ?manager ?leaf 
order by ?manager ?leaf

This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.

715

asked Jun 20 '16 11:06

TJ Tang

1 Answers

I think the biggest problem is going to be the BIND() - MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the ?manager variable to see if that makes a big difference? i.e.:

select  ?leaf (count(?mid) as ?distance) { 
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+
    <http://rdf.abbvienet.com/infrastructure/person/10025613> .
}
group by ?leaf 
order by ?leaf

StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?

111

answered Sep 25 '22 12:09

John Snelson

Related questions
                            
                                Jena: How to infer data / performance issues
                            
                                SPARQL: Get "most specific property"
                            
                                A robust search on DBpedia by title
                            
                                How to get a concise bounded description of a resource with Sesame?
                            
                                Query RDF using SPARQL / Sesame
                            
                                How to get only the most recent value from a Wikidata property?
                            
                                Filter by language only if the object is a literal
                            
                                Ontologies, OWL, Sparql: Modelling that "something is not there" and performance considerations
                            
                                How To Provide A SPARQL Endpoint Using Tomcat 7.0.27
                            
                                How to reduce the size of the TDB-backed Jena Dataset?
                            
                                Querying large RDF Datasets out of memory
                            
                                sparql queries with round brackets throw exception
                            
                                Sparql query running forever
                            
                                Check multiple resources exist or not via SPARQL query
                            
                                SPARQL Query with colon in predicate
                            
                                Can I combine local and remote dataset within SPARQL query?
                            
                                Grouping data by month in SPARQL?
                            
                                Can't get this SPARQL query to work
                            
                                SPARQL regex filter
                            
                                How to reference a page that contains parenthesis in SPARQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With