Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Representing and performing IOs on graphs and subgraphs

I have a problem in which I need to perform CRUD operations on cyclic graphs. Now I know that there are a bunch of graph databases out there, but I have a specific set of use cases which are not supported in those databases (or at least I'm not aware of them).

Following are my constructs:

  • Node: Can have multiple sources and targets
  • Directed edge: Connects two nodes
  • Node Group: Multiple nodes (connected with edges) forming a group (simply put, it's a smaller graph)
  • Directed graph: Comprises of multiple nodes, node groups and edges. The graph can be cyclic.

Following are the functionalities I can have:

  • I can simply create a node by defining the incoming and outgoing edge definitions.
  • I can create a simple graph by adding nodes and connecting them with edges.
  • I can perform standard graph traversals.
  • I can now group the nodes of a graph and call it as a Node Group which I can use multiple instances of this Node Group (just like a node) in another bigger graph. This can create complex hierarchies.
  • I can create multiple graphs which in turn use any of the above constructs.
  • I can make changes to Node and Node Group definitions, which means there can be structural changes to the graph. If I make changes to a Node or Node Group definition, all the instances of this node in all the graphs should be updated too.

Now I understand that all of this can be done best with a relational database which will ensure that the relationships are intact and querying is simple. But the performance will take a hit when there are complex graphs and multiple of those graphs are to be updated.

So, I was wondering if there is a hybrid/better approach to storing, retrieving and updating these graphs that would be much faster compared to relational databases.

Any ideas would be really helpful. Thanks in advance!

like image 340
Amith Koujalgi Avatar asked Jul 18 '18 10:07

Amith Koujalgi


1 Answers

I wouldn't fence-out graph databases. You can easily build the missing features yourself, using extra properties/nodes/connections that serve your needs.

E.g. for creating a group, you could create a node with some prop type:Group which shares the same groupId, with all the nodes belonging to that group.

Another option would be for group members to have an extra connection towards their Group: Node-belongsToGroup->GroupNode.

In any of the above solutions, to connect a Node/Group to another Group, would just require to create a connection towards the Group node only.

The same goes for Definitions, e.g. Node-isOfType->DefinitionNode. Then updateDefinition would update all nodes that belong to that Definition.

Based on the above I think it would be easy to create an api like the following:

createGroup
isGroup
addNodesToGroup
createDefinition
updateDefinition
setNodeDefinition
getNodeDefinition

As far as scalability is concearned you could check OrientDb: Distributed-Architecture / comparison to neo4j

...only one server can be the master, so the Neo4j write throughput is limited to the capacity of the single Master server. This means that Neo4j isn’t able to scale on writes.

OrientDB, instead, supports a Multi-Master + Sharded architecture: all the servers are masters. The throughput is not limited by a single server. With OrientDB, the global throughput is the sum of the throughput of all the servers.

api ref: java api / sql ref

like image 93
Marinos An Avatar answered Sep 29 '22 12:09

Marinos An