Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Triplestore for Large Datasets [closed]

I want to ask about a good triplestore to use for large datasets, it should:

  • Scale well (millions of triples)
  • Have a Java interface
like image 683
myahya Avatar asked Feb 07 '11 12:02

myahya


People also ask

Why use triple store?

Triplestores are more flexible and less costly than a relational database, for example. The RDF database, often called a semantic graph database, is also capable of handling powerful semantic queries and of using inference for uncovering new information out of the existing relations.

How do triple stores work?

Triplestores use URIs, which means they support querying and reasoning about the Semantic WebSemantic WebThe Semantic Web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.https://en.wikipedia.org › wiki › Semantic_WebSemantic Web - Wikipedia. Unlike relational databases which store data in tables, triplestores store data as statements in the Subject-Predicate-Object form, such as “Jessica teaches Computer Science”; each statement is called a triple.

How does RDF store its data?

RDFStore implements a generic hashed data storage that allows to serialise RDF models, resources, properties and property values either to disk or in-memory data structures. It does support several different persistent storage models such as SDBM, BerkeleyDB (standard and Sleepycat) and DBMS.

Is Neo4j a triple store?

Since the Neo4j graph database is not a triple store, it is not equipped with a SPARQL query engine. However, Neo4j offers Cypher and for many semantic applications it should be possible to translate SPARQL to Cypher queries.


2 Answers

You should consider using the OpenLink Virtuoso store. It is available via an OpenSource license and scales to billions of triples. You can use it via the Sesame and Jena APIs.

See here for an overview of large scale triple stores. Virtuoso is definitely easier to set up than BigData. Beside that I have used the Sesame NativeStore, which doesn't scale too well.

4Store is also a good choice, although I haven't used it. One benefit of Virtuoso over 4Store is that you can easily mix standard relational models with RDF, since Virtuoso is under the hood a relational database.

like image 76
Timo Westkämper Avatar answered Oct 01 '22 04:10

Timo Westkämper


4store: Scalable RDF storage

Quoting 4store Web ...

4store's main strengths are its performance, scalability and stability. It does not provide many features over and above RDF storage and SPARQL queries, but if your are looking for a scalable, secure, fast and efficient RDF store, then 4store should be on your shortlist.

Personally I have tested 4store with very large databases (up to 2 billion triples) with very good results. 4store is written in C, runs on Linux/Unix 64 bit platforms and the current version 1.1.1 has partially implemented SPARQL 1.1.

4store can be deployed on a cluster of commodity servers which may boost the performance of your queries and assertion throughput can get up to 100 KTriples/second. But even if you use it in a single server you will get quite a decent performance.

Here at the University of Southampton is our choice for very big datasets in research projects and also for our Webmaster team, see Data Stores for Southampton and ECS Open Data.

Here you have also a list of all the libraries that you can use to query and administrate 4store Client Libraries. Also, 4store's IRC channel has an active community of users that will help if you run into any issues.

If you are a Linux/Unix user 4store is definitely a good choice.

like image 45
Manuel Salvadores Avatar answered Oct 01 '22 02:10

Manuel Salvadores