Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate length of path between nodes?

Tags:

rdf

sparql

jena

How can I retrieve the length of a path between two nodes? For instance, given an organizational hierarchy, how can I determine how far separated are a parent and an descendant organization? Consider the following scenarios:

  1. OrgA -hasSubOrganization-> OrgB, OrgC

    This is the very simplistic case where I want to get all the immediate suborganizations of an entity. Hence the path length is 1.

  2. OrgA -> OrgB -> OrgC

    or the general case

    OrgA -> OrgB - - - - - - - - OrgZ
    

I want to recursively traverse down the graph and find each organization belonging to another organization through the hasSubOrganization property. To get all the sub-organizations recursive I can use property paths, e.g., the + operator:

OrgA hasSubOrganization+ ?subOrg

This will give me all the suborganizations right down to the leaf nodes. But my ultimate goal is to build the organization hierarchy, but the information about the "Number of nodes/steps/levels/hops away a suborganization is" is lost. This means that I cannot recreate the org structure for a visualization.

How can I capture the "number of nodes away" information in addition to the name of the suborganization?

like image 225
Chantz Avatar asked Mar 04 '11 20:03

Chantz


People also ask

How do you calculate path length?

Path length is simply the distance between two nodes, measured as the number of edges between them. If Amy is Brad's friend, and Brad is Calvin's friend, then the path length between Amy and Calvin is 2.

How do you find the path length of a graph?

In calculating the average path length, we must find the shortest path from a source node to all other nodes contained within the graph. Previously, we found that by using an inefficient algorithm, experimentally calculating the path length of a graph can be time consuming.

What is the length of the single path tree with n nodes?

The path length of a tree with N nodes is the sum of the path lengths of the subtrees of its root plus N-1.

What do you mean by path length of a node in a tree?

Now, the sum of the lengths of the paths from the root to each node in a tree is called the path length of the tree. Thus, one way we can measure the efficiency of a class of trees is by studying the path length of trees of a given size.


2 Answers

This is based on the same technique used to compute the position of an element in an RDF list using SPARQL that is described in: Is it possible to get the position of an element in an RDF Collection in SPARQL?

If you have data like this:

@prefix : <http://example.org> .

:orgA :hasSuborganization :orgB, :orgC, :orgD.
:orgB :hasSuborganization :orgE, :orgF.
:orgE :hasSuborganization :orgG.
:orgG :hasSuborganization :orgH.

which describes a hierarchy like this:

organization hierarchy

then you can use a query like this:

prefix : <http://example.org> 

select ?super ?sub (count(?mid) as ?distance) { 
  ?super :hasSuborganization* ?mid .
  ?mid :hasSuborganization+ ?sub .
}
group by ?super ?sub 
order by ?super ?sub

to get results like these:

$ sparql --query query.rq --data subs.n3
----------------------------
| super | sub   | distance |
============================
| :orgA | :orgB | 1        |
| :orgA | :orgC | 1        |
| :orgA | :orgD | 1        |
| :orgA | :orgE | 2        |
| :orgA | :orgF | 2        |
| :orgA | :orgG | 3        |
| :orgA | :orgH | 4        |
| :orgB | :orgE | 1        |
| :orgB | :orgF | 1        |
| :orgB | :orgG | 2        |
| :orgB | :orgH | 3        |
| :orgE | :orgG | 1        |
| :orgE | :orgH | 2        |
| :orgG | :orgH | 1        |
----------------------------

The trick here is to recognize that any path from X to Y can be viewed as a (possibly empty) path from X to some intermediate node Z (nonempty means that you can choose X as Z) concatenated with a (non empty) path from Z to Y. The number of possible ways of picking Z indicates the length of the path.

like image 99
Joshua Taylor Avatar answered Nov 09 '22 00:11

Joshua Taylor


You can't do this using propery paths since the working group specifically chose not to make this information available as it makes implementation much more complex.

If you want to generate a hierarchy it will probably be just as efficient to make a whole series of SPARQL queries where each query expands one leaf of the hierarchy and not use property paths at all if your goal is just to visualise the hierarchy

There may be other approaches using the Jena Ontology API - I'd recommend asking on their mailing list [email protected] for more expert help

like image 40
RobV Avatar answered Nov 09 '22 02:11

RobV