Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access OWL documents using XPath in Java?

Tags:

java

xpath

rdf

owl

I am having an OWL document in the form of an XML file. I want to extract elements from this document. My code works for simple XML documents, but it does not work with OWL XML documents.

I was actually looking to get this element: /rdf:RDF/owl:Ontology/rdfs:label, for which I did this:

 DocumentBuilder builder = builderfactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(
            new File(XpathMain.class.getResource("person.xml").getFile()));

    XPathFactory factory = javax.xml.xpath.XPathFactory.newInstance();
    XPath xPath = factory.newXPath();
    XPathExpression xPathExpression = xPath.compile("/rdf:RDF/owl:Ontology/rdfs:label/text()");
    String nameOfTheBook = xPathExpression.evaluate(xmlDocument,XPathConstants.STRING).toString();

I also tried extracting only the rdfs:label element this way:

 XPathExpression xPathExpression = xPath.compile("//rdfs:label");        
 NodeList nodes = (NodeList) xPathExpression.evaluate(xmlDocument, XPathConstants.NODESET);

But this nodelist is empty.

Please let me know where I am going wrong. I am using Java XPath API.

like image 634
Shafi Avatar asked Jun 11 '13 05:06

Shafi


3 Answers

Don't query RDF (or OWL) with XPath

There's already an accepted answer, but I wanted to elaborate on @Michael's comment on the question. It's a very bad idea to try to work with RDF as XML (and hence, the RDF serialization of an OWL ontology), and the reason for that is very simple: the same RDF graph can be serialized as lots of different XML documents. In the question, all that's being asked for the is rdfs:label of an owl:Ontology element, so how much could go wrong? Well, here are two serializations of the ontology.

The first is fairly human readable, and was generated by the OWL API when I saved the ontology using the Protégé ontology editor. The query in the accepted answer would work on this, I think.

<rdf:RDF xmlns="http://www.example.com/labelledOnt#"
     xml:base="http://www.example.com/labelledOnt"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <owl:Ontology rdf:about="http://www.example.com/labelledOnt">
        <rdfs:label>Here is a label on the Ontology.</rdfs:label>
    </owl:Ontology>
</rdf:RDF>

Here is the same RDF graph using fewer of the fancy features available in the RDF/XML encoding. This is the same RDF graph, and thus the same OWL ontology. However, there is no owl:Ontology XML element here, and the XPath query will fail.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns="http://www.example.com/labelledOnt#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://www.example.com/labelledOnt">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
    <rdfs:label>Here is a label on the Ontology.</rdfs:label>
  </rdf:Description>
</rdf:RDF>

You cannot reliably query an RDF graph in RDF/XML serialization by using typical XML-processing techniques.

Query RDF with SPARQL

Well, if we cannot query reliably query RDF with XPath, what are we supposed to use? The standard query language for RDF is SPARQL. RDF is a graph-based representation, and SPARQL queries include graph patterns that can match a graph.

In this case, the pattern that we want to match in a graph consists of two triples. A triple is a 3-tuple of the form [subject,predicate,object]. Both triples have the same subject.

  • The first triple says that the subject is of type owl:Ontology. The relationship “is of type” is rdf:type, so the first triple is [?something,rdf:type,owl:Ontology].
  • The second triple says that subject (now known to be an ontology) has an rdfs:label, and that's the value that we're interested in. The corresponding triple is [?something,rdfs:label,?label].

In SPARQL, after defining the necessary prefixes, we can write the following query.

PREFIX owl: <http://www.w3.org/2002/07/owl#>                                                                                                                                                   
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>                                                                                                                                           

SELECT ?label WHERE {                                                                                                                                                                          
  ?ontology a owl:Ontology ;                                                                                                                                                                   
            rdfs:label ?label .                                                                                                                                                                
}

(Note that because rdf:type is so common, SPARQL includes a as an abbreviation for it. The notation s p1 o1; p2 o2 . is just shorthand for the two-triple pattern s p1 o1 . s p2 o2 ..)

You can run SPARQL queries against your model in Jena either programmatically, or using the command line tools. If you do it programmatically, it is fairly easy to get the results out. To confirm that this query gets the value we're interested in, we can use Jena's command line for arq to test it out.

$ arq  --data labelledOnt.owl --query getLabel.sparql
--------------------------------------
| label                              |
======================================
| "Here is a label on the Ontology." |
--------------------------------------
like image 152
Joshua Taylor Avatar answered Oct 20 '22 21:10

Joshua Taylor


as xpath does not know the namespaces you are using. try using:

"/*[local-name()='RDF']/*[local-name()='Ontology']/*[local-name()='label']/text()"

local name will ignore the namespaces and will work (for the first instance of this that it finds)

like image 2
Sean F Avatar answered Oct 20 '22 20:10

Sean F


You would be able to use namespaces in query if you implement javax.xml.namespace.NamespaceContext for yourself. Please have a look at this answer https://stackoverflow.com/a/5466030/1443529, this explains how to get it done.

like image 1
cpz Avatar answered Oct 20 '22 19:10

cpz