Graph databases and RDF triplestores: storage of graph data in python

Tags:

I need to develop a graph database in python (I would enjoy if anybody can join me in the development. I already have a bit of code, but I would gladly discuss about it).

I did my research on the internet. in Java, neo4j is a candidate, but I was not able to find anything about actual disk storage. In python, there are many graph data models (see this pre-PEP proposal, but none of them satisfy my need to store and retrieve from disk.

I do know about triplestores, however. triplestores are basically RDF databases, so a graph data model could be mapped in RDF and stored, but I am generally uneasy (mainly due to lack of experience) about this solution. One example is Sesame. Fact is that, in any case, you have to convert from in-memory graph representation to RDF representation and viceversa in any case, unless the client code wants to hack on the RDF document directly, which is mostly unlikely. It would be like handling DB tuples directly, instead of creating an object.

What is the state-of-the-art for storage and retrieval (a la DBMS) of graph data in python, at the moment? Would it make sense to start developing an implementation, hopefully with the help of someone interested in it, and in collaboration with the proposers for the Graph API PEP ? Please note that this is going to be part of my job for the next months, so my contribution to this eventual project is pretty damn serious ;)

Edit: Found also directededge, but it appears to be a commercial product

645

asked Aug 19 '09 23:08

Stefano Borini

1 Answers

I have used both Jena, which is a Java framework, and Allegrograph (Lisp, Java, Python bindings). Jena has sister projects for storing graph data and has been around a long, long time. Allegrograph is quite good and has a free edition, I think I would suggest this cause it is easy to install, free, fast and you could be up and going in no time. The power you would get from learning a little RDF and SPARQL may very well be worth your while. If you know SQL already then you are off to a great start. Being able to query your graph using SPARQL would yield some great benefits to you. Serializing to RDF triples would be easy, and some of the file formats are super easy ( NT for instance ). I'll give an example. Lets say you have the following graph node-edge-node ids:

1 <- 2 -> 3
3 <- 4 -> 5

these are already subject predicate object form so just slap some URI notation on it, load it in the triple store and query at-will via SPARQL. Here it is in NT format:

<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .

Now query for all nodes two hops from node 1:

SELECT ?node
WHERE {
    <http://mycompany.com#1> ?p1 ?o1 .
    ?o1 ?p2 ?node .
}

This would of course yield <http://mycompany.com#5>.

Another candidate would be Mulgara, written in pure Java. Since you seem more interested in Python though I think you should take a look at Allegrograph first.

183

answered Sep 17 '22 17:09

harschware

Related questions
                            
                                Maximum flow - Ford-Fulkerson: Undirected graph
                            
                                Python Pulp using with Matrices
                            
                                Embedding python in multithreaded C application
                            
                                Memory profiling/monitoring (python) on Google AppEngine
                            
                                Building an HTML Diff/Patch Algorithm
                            
                                Loading a Neo4j subgraph into Networkx
                            
                                Python: Differences between lists and numpy array of objects
                            
                                Generate character images with a font whose name cannot be correctly decoded
                            
                                How do I get py.test to recognize conftest.py in a subdirectory?
                            
                                Alembic support for multiple Postgres schemas
                            
                                ModelViewSet - Update nested field
                            
                                3D Plot with Matplotlib: Hide axes but keep axis-labels?
                            
                                Is filter thread-safe
                            
                                PyQt5: Create semi-transparent window with non-transparent children
                            
                                Apply function to column before filtering
                            
                                How to stop spark streaming when the data source has run out
                            
                                Best practice to write logs in /var/log from a python script?
                            
                                How to force application version on AWS Elastic Beanstalk
                            
                                How to dynamically add and load entry points?
                            
                                numpy: "size" vs. "shape" in function arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Graph databases and RDF triplestores: storage of graph data in python

Tags:

python

database

graph

graph-databases

Stefano Borini

People also ask

1 Answers

harschware

Recent Activity

Donate For Us