Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About DBPedia Access

I am very new to DBPedia and I don't know how to and from where to start. I did some research on this and from that what I understand is we can access the data using SPARQL query language (Apache Jena). So I started to download the .ttl files for the Ontology Infobox Properties. After that I extracted this file its almost 2GB. Here my problem started None of the editors are unable to open this file. My sample program to access this file is here...

public class OntologyExample {
public static void main(String[] args) {
    FileManager.get().addLocatorClassLoader(
            OntologyExample.class.getClassLoader());
    Model model = FileManager
            .get()
            .loadModel("D:\\Dell XPS\\DBPEDIA\\instance_types_en.ttl\\instance_types_en.ttl");


    String q = "SELECT * WHERE { "
            + "?e <http://dbpedia.org/ontology/series> <http://dbpedia.org/resource/The_Sopranos>  ."
            + "?e <http://dbpedia.org/ontology/releaseDate> ?date"
            + "?e <http://dbpedia.org/ontology/episodeNumber>  ?number   "
            + "?e <http://dbpedia.org/ontology/seasonNumber>   ?season"
            + " }" + "ORDER BY DESC(?date)";

    Query query = QueryFactory.create(q);
    QueryExecution queryExecution = QueryExecutionFactory.create(query,
            model);
    ResultSet resultSet = queryExecution.execSelect();
    ResultSetFormatter.out(System.out, resultSet, query);
    queryExecution.close();
}
}

So the input for this program is that 2GB file. So I just ran this sample program its throwing exception like

Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: GC overhead limit exceeded
at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:63)
at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:33)
at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:119)
at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:84)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:268)
at com.hp.hpl.jena.util.FileManager.readModelWorker(FileManager.java:403)
at com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:306)
at com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:258)
at jena.tutorial.OntologyExample.main(OntologyExample.java:18)

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.<init>(Unknown Source)
at org.apache.jena.iri.impl.LexerPath.yytext(LexerPath.java:420)
at org.apache.jena.iri.impl.AbsLexer.rule(AbsLexer.java:81)
at org.apache.jena.iri.impl.LexerPath.yylex(LexerPath.java:711)
at org.apache.jena.iri.impl.AbsLexer.analyse(AbsLexer.java:52)
at org.apache.jena.iri.impl.Parser.<init>(Parser.java:108)
at org.apache.jena.iri.impl.IRIImpl.<init>(IRIImpl.java:65)
at org.apache.jena.iri.impl.AbsIRIImpl.create(AbsIRIImpl.java:692)
at org.apache.jena.iri.IRI.resolve(IRI.java:432)
at com.hp.hpl.jena.n3.IRIResolver.resolve(IRIResolver.java:167)
at com.hp.hpl.jena.n3.turtle.ParserBase._resolveIRI(ParserBase.java:198)
at com.hp.hpl.jena.n3.turtle.ParserBase.resolveIRI(ParserBase.java:192)
at com.hp.hpl.jena.n3.turtle.ParserBase.resolveQuotedIRI(ParserBase.java:183)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.IRI_REF(TurtleParser.java:737)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.IRIref(TurtleParser.java:680)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.GraphTerm(TurtleParser.java:496)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.VarOrTerm(TurtleParser.java:420)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.TriplesSameSubject(TurtleParser.java:150)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.Statement(TurtleParser.java:97)
at com.hp.hpl.jena.n3.turtle.parser.TurtleParser.parse(TurtleParser.java:67)
at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:49)
... 8 more

I am running this code from my Eclipse and here are my Eclipse .ini preferences.

org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
--launcher.XXMaxPermSize
512M
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
512m
--launcher.defaultAction
openFile
-vmargs
-Dosgi.requiredJavaVersion=1.5
-Xms1024m
-Xmx2048m

So my problems here is

  1. How can I access this kind of large files.
  2. How can I use DBPedia in a proper manner.

So please help me I am stuck over here. I am doing a project on DBpedia.

like image 355
Amar Avatar asked Feb 24 '26 17:02

Amar


1 Answers

You can use Jena's ARQ to run SPARQL queries against DBpedia data, and if you are going to do lots of queries and data processing, it is useful to download the data and work with it locally. to do that, especially, with data as large as DBpedia's, you probably shouldn't try to load it into an in memory model, but use TDB and Fuseki to set up SPARQL endpoint that you can run queries against. This has been discussed for a different dataset in this answer.

However, since you're just getting started, it's probably much easier to work with the public DBpedia SPARQL endpoint. There you can type in SPARQL queries and retrieve results in a variety of formats. The query in your question was a bit malformed, but was easy enough to clean up; the cleaned up and working query follows.

SELECT * WHERE {
    ?e <http://dbpedia.org/ontology/series> <http://dbpedia.org/resource/The_Sopranos>  .
    ?e <http://dbpedia.org/ontology/releaseDate> ?date .
    ?e <http://dbpedia.org/ontology/episodeNumber> ?number .
    ?e <http://dbpedia.org/ontology/seasonNumber> ?season .
}
ORDER BY DESC(?date)

SPARQL results

The DBpedia wiki actually has a whole page about accessing DBpedia online Accessing the DBpedia Data Set over the Web, that will give you some idea of how you can access the data. Another page on the wiki, The DBpedia Data Set will tell you much more about what data is available.

like image 86
Joshua Taylor Avatar answered Feb 27 '26 01:02

Joshua Taylor