Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SPARQL XML results from DBpedia and Jena

I get the following XML from the DBpedia SPARQL end point:

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="onto"/>
  </head>
  <results>
    <result>
      <binding name="onto">
        <uri>http://www.w3.org/2002/07/owl#Thing</uri>
      </binding>
    </result>
    <result>
      <binding name="onto">
        <uri>http://www.w3.org/2002/07/owl#Thing</uri>
      </binding>
    </result>
    <result>
      <binding name="onto">
        <uri>http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing</uri>
      </binding>
    </result>
  </results>
</sparql>

When I read it with Jena and I try to scan it:

  ResultSet r = ResultSetFactory.fromXML( xmlCode );
  while ( r.hasNext() ) {
    QuerySolution soln = r.next()
    ...
  }

I always get the following exception:

com.hp.hpl.jena.sparql.resultset.ResultSetException: End of document while processing solution
    at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLInputStAX.java:503)
    at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:413)
    at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:218)

Is this a Jena bug or what else?

EDIT: For completeness, I report a thread about this error:

When some help from the bio2rdf mailing list, we were able to track the error down a bit more.

Arq 2.8.3 works fine Arq 2.8.4 fails with the described error Arq 2.8.5 fails with the described error So I guess I will keep Art 2.8.3 for my tests. Let me know if I can help to debug this error a bit more.

Weird. The error is coming from the STaX parser - all the base level XML parsing is subcontracted to Woodstox. It's almost if it is reading faster than the input arrives and sees EOF rather than blocking for new input. I tried reading the whole stream then parsing the bytes read and it works OK. Why 2.8.3 should be different is unclear to me at the moment it might just be timing.

Workaround: switch the XML parsers with:

ARQ.getContext().setTrue(ARQ.useSAX) ;

before making the call to QueryExecutionFactory.sparqlService Andy

like image 891
Mulone Avatar asked Nov 06 '22 00:11

Mulone


1 Answers

The XML results look perfectly valid (and parse with other tools without issue) so this may be some issue with Jena though given the relative maturity of the Jena framework I'd be surprised if it would error on such a simple and obviously valid input.

How exactly are you reading the XML from DBPedia? I'd suspect that the bug might have to do with the retrieval and format of the XML string in your Java code rather than with Jena's code.

Also why do it this way, why not use ARQ's QueryExecutionFactory.sparqlService(String service, String query) method?

like image 85
RobV Avatar answered Nov 21 '22 23:11

RobV