I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy.
The triples in the document look like
<sem:triple xmlns:sem="http://marklogic.com/semantics">
<sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject>
<sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate>
<sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object>
</sem:triple>
I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are.
select ?manager ?leaf (count(?mid) as ?distance) {
BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager)
?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager .
}
group by ?manager ?leaf
order by ?manager ?leaf
This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.
Online UI are http://dbpedia.org/snorql/ and http://dbpedia.org/sparql/ , they both accept SparQL queries.
SPARQL and SQL have very similar UNION and MINUS operators, which respectively add and remove solutions from a solution set. Because the datatypes of an SQL table are assumed to be uniform across all rows, care must be taken to align the datatypes of the SELECT.
SPARQL, short for “SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF. The SPARQL standard is designed and endorsed by the W3C and helps users and developers focus on what they would like to know instead of how a database is organized.
SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.
I think the biggest problem is going to be the BIND()
- MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the ?manager
variable to see if that makes a big difference? i.e.:
select ?leaf (count(?mid) as ?distance) {
?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+
<http://rdf.abbvienet.com/infrastructure/person/10025613> .
}
group by ?leaf
order by ?leaf
StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With