Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for storing a neural network in a database

I am developing an application that uses a neural network. Currently I am looking at either trying to put it into a relational database based on SQL (probably SQL server) or a graph database.

From a performance viewpoint, the neural net will be very large.

My questions:

  1. Do relational databases suffer a performance hit when dealing with a neural net in comparison to graph databases?
  2. What graph-database technology would be best suited to dealing with a large neural net?
  3. Can a geospatial database such as PostGIS be used to represent a neural net efficiently?
like image 903
ose Avatar asked Sep 20 '13 09:09

ose


1 Answers

That depends on the intent of progress on the model.

  1. Do you have a fixated idea on an immutable structure of the network? Like a Kohonnen map. Or an off-the-shelf model.
  2. Do you have several relationship structures you need to test out, so that you wish to be able flip a switch to alternate between various structures.
  3. Does your model treat the nodes as fluid automatons, free to seek their own neighbours? Where each automaton develops unique characteristic values of a common set of parameters, and you need to analyse how those values affect their "choice" of neighbours.
  4. Do you have a fixed set of parameters for a fixed number of types/classes of nodes? Or is a node expected to develop a unique range of attributes and relationships?
  5. Do you have frequent need to access each node, especially those embedded deep in the network layers, to analyse and correlate them?
  6. Is your network perceivable as, or quantizable into, set of state-machines?

Disclaimer
First of all, I need to disclaim that I am familiar only with Kohonnen maps. (So, I admit having been derided for Kohonnen as being only entry-level of anything barely neural-network.) The above questions are the consequence of personal mental exploits I've had over the years fantasizing after random and lowly-educated reading of various neural shemes.

Category vs Parameter vs Attribute
Can we class vehicles by the number of wheels or tonnage? Should wheel-quantity or tonnage be attributes, parameters or category-characteristics.

Understanding this debate is a crucial step in structuring your repository. This debate is especially relevant to disease and patient vectors. I have seen patient information relational schemata, designed by medical experts but obviously without much training in information science, that presume a common set of parameters for every patient. With thousands of columns, mostly unused, for each patient record. And when they exceed column limits for a table, they create a new table with yet thousands more of sparsely used columns.

  • Type 1: All nodes have a common set of parameters and hence a node can be modeled into a table with a known number of columns.

  • Type 2: There are various classes of nodes. There is a fixed number of classes of nodes. Each class has a fixed set of parameters. Therefore, there is a characteristic table for each class of node.

  • Type 3: There is no intent to pigeon-hole the nodes. Each node is free to develop and acquire its own unique set of attributes.

  • Type 4: There are fixed number of classes of nodes. Each node within a class is free to develop and acquire its own unique set of attributes. Each class has a restricted set of attributes a node is allowed to acquire.

Read on EAV model to understand the issue of parameters vs attributes. In an EAV table, a node needs only three characterising columns:

  • node id
  • attribute name
  • attribute value

However, under constraints of technology, an attribute could be number, string, enumerable or category. Therefore, there would be four more attribute tables, one for each value type, plus the node table:

  • node id
  • attriute type
  • attribute name
  • attribute value

Sequential/linked access versus hashed/direct-address access
Do you have to access individual nodes directly rather than traversing the structural tree to get to a node quickly?

Do you need to find a list of nodes that have acquired a particular trait (set of attributes) regardless of where they sit topologically on the network? Do you need to perform classification (aka principal component analysis) on the nodes of your network?

State-machine
Do you wish to perceive the regions of your network as a collection of state-machines? State machines are very useful quantization entities. State-machine quatization helps you to form empirical entities over a range of nodes based on neighbourhood similarities and relationships.

Instead of trying to understand and track individual behaviour of millions of nodes, why not clump them into regions of similarity. And track the state-machine flow of those regions.

Conclusion

This is my recommendation. You should start initially using a totally relational database. The reason is that relational database and the associated SQL provides information with a very liberal view of relationship. With SQL on a relational model, you could inquire or correlate relationships that you did not know exist.

As your experiments progress and you might find certain relationship modeling more suitable to a network-graph repository, you should then move those parts of the schema to such suitable repository.

In the final state of affairs. I would maintain a dual mode information repo. You maintain a relational repo to keep track of nodes and their attributes. So you store the dynamically mutating structure in a network-graph repository but each node refers to a node id in a relational database. Where the relational database allows you to query nodes based on attributes and their values. For example,

SELECT id FROM Nodes a, NumericAttributes b
WHERE a.attributeName = $name
  AND b.value WItHIN $range
  AND a.id = b.id

I am thinking, perhaps, hadoop could be used instead of a traditional network-graph database. But, I don't know how well hadoop adapts to dynamically changing relationships. My understanding is that hadoop is good for write-once read-by-many. However, a dynamic neural network may not perform well in frequent relationship changes. Whereas, a relational table modeling network relationships is not efficient.

Still, I believe I have only exposed questions you need to consider rather than providing you with a definite answer, especially with a rusty knowledge on many concepts.

like image 175
Blessed Geek Avatar answered Sep 28 '22 06:09

Blessed Geek