How to optimize graph traversals in ArangoDB?

Q: What is the maximum depth of a traversal in ArangoDB?

We use a max depth of 2. We follow only in OUTBOUND direction of edges.

Q: How do you make a graph on ArangoDB?

You can use the add samples tab in the create graph window in the web interface, or load the module @arangodb/graph-examples/example-graph in arangosh and use it to create instances of these graphs in your ArangoDB.

Q: Which of the following formats are supported in ArangoDB?

The documents you can store in ArangoDB closely follow the JSON format, although they are stored in a binary format called VelocyPack.

Q: Is ArangoDB a database graph?

Using Graphs in ArangoDB. Unlike many NoSQL databases, ArangoDB is a native multi-model database. You can store your data as key/value pairs, graphs or documents and access any or all of your data using a single declarative query language.

Tags:

graph-databases

arangodb

aql

I primarily intended to ask this question : "Is ArangoDB a true graph database ?"

But, this question would sound quite offending.

You, peoples at triAGENS, did a really great job in creating a "multi-paradigm" database. As a user of PostgreSQL, PostGIS, MongoDB and Neo4J/Titan, I really appreciate to see an "all-in-one" solution :)

But the question remains, basically creating a graph in ArangoDB requires to create two separate collections : one for edges and one for vertices, thus, as far as I understand, it already means that vertices and related edges are not "physically" neighbors.

Moreover, even after creating appropriate index, I'm facing some serious performance issues when doing this kind of stuff in Gremlin

g.v('an_id').out('likes').in('likes').count()

Which returns a result after ~ 3 seconds (perceived time)

I assumed I poorly understood how Gremlin and Blueprint/ArangoDB worked so I tried to rewrite the same query using AQL :

LET lst = (FOR e1 in NEIGHBORS(vertices, edges, "an_id", "outbound", [ { "$label": "likes" } ] )
    FOR e2 in NEIGHBORS(vertices, edges, e1.edge._to, "inbound", [ { "$label": "likes" } ] )
        RETURN 1
    )
RETURN length(lst)

Which gives me a delay of same order of magnitude.

If I tried to run the same query on a Titan or Neo4j database (with the very same data), queries returns almost immediately (perceived time : <200ms)

So it seems to me that ArangoDB graph features are a "smart graph layer" above a "traditionnal document database" but that ArangoDB is not a "native" graph database.

To confirm this feeling, I transform data to load it in PostgreSQL and run a query (with a multiple table JOIN as you can assume) and got similar (to ArangoDB) execution delays

Did I do something wrong (in AQL query) ?

Is there a way to optimize the database to get better traversal times ?

In PostgreSQL, conceptually, I would mix edge and node and use a CLUSTER clause to physically order data, does something similar can be done in ArangoDB ? (I assume that it would be hard, as it would involve to "interlace" edges and nodes, just an intuition)

595

asked Jan 09 '14 12:01

Raphaël Braud

1 Answers

i am a Core Developer of ArangoDB. Could you give me a bit more information ob the dimensions of data you are using?

Amount of vertices
Amount of edges

Then we can create our own setup with equal dimensions and optimize it.

129

answered Sep 19 '22 20:09

mchacki

Related questions
                            
                                AttributeError: 'Graph' object has no attribute 'cypher' in migration of data from Postgress to Neo4j(Graph Database)
                            
                                Change management for graph databases?
                            
                                Display only a specific relationship type in a Neo4j Browser query
                            
                                How can I stop the movement of the nodes in Neo4j?
                            
                                Is there such a thing as a schema in a graph database?
                            
                                Can RDF model a labeled property graph with edge properties?
                            
                                How to prevent Gremlin injection in C#?
                            
                                Neo4j node property type
                            
                                Google Prediction API vs Graph Databases for Generated Recommendations?
                            
                                Graph Database Design Methodologies
                            
                                Gremlin : Multiple filter condition "OR"
                            
                                Why would index nodes or an indexed property be better in a graph database?
                            
                                Neo4j vs Apache Giraph in graph traversal
                            
                                Performance of arbitrary queries with Neo4j
                            
                                What can an RDBMS do that Neo4j (and graph databases) cant?
                            
                                Storing very large graphs on disk/streaming graph partitioning algorithms?
                            
                                Rails 3 and graph databases
                            
                                Is it possible to visualize the output of a graph query (Gremlin or SPARQL) as nodes and edges in Amazon Neptune? [closed]
                            
                                Hold entire Neo4j graph database in RAM?
                            
                                Simple way to delete a relationship by ID in Neo4j Cypher?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With