A Markov chain is composed of a set of states which can transition to other states with a certain probability.
A Markov chain can be easily represented in Neo4J by creating a node for each state, a relationship for each transition, and then annotating the transition relationships with the appropriate probability.
BUT, can you simulate the Markov chain using Neo4J? For instance, can Neo4J be coerced to start in a certain state and then make transitions to the next state and the next state based upon probabilities? Can Neo4J return with a printout of the path that it took through this state space?
Perhaps this is easier to understand with a simple example. Let's say I want to make a 2-gram model of English based upon the text of my company's tech blog. I spin up a script which does the following:
count/totalcount
. This is the transition probability.Now that the Neo4J graph is complete, how do I make it create a "sentence" from my 2-gram model of English? Here is what the output might look like:
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
Simulating from a Markov Chain One can simulate from a Markov chain by noting that the collection of moves from any given state (the corresponding row in the probability matrix) form a multinomial distribution. One can thus simulate from a Markov Chain by simulating from a multinomial distribution.
Neo4j is a graph database. A graph database, instead of having rows and columns has nodes edges and properties. It is more suitable for certain big data and analytics applications than row and column databases or free-form JSON document databases for many use cases. A graph database is used to represent relationships.
Neo4j doesn't provide the functionality you're asking for out of the box, but since you've already come as far as correctly populating your database, the traversal that you need is just a few lines of code.
I've recreated your experiment here, with a few modifications. First of all, I populate the database with a single pass through the text (steps 2 and 3), but that's a minor. More importantly, I only store the number of occurrences on each relationship and the total number on the node (step 4), as I don't think there is a need to pre-calculate probabilities.
The code that you're asking for then looks something like this:
/**
* A component that creates a random sentence by a random walk on a Markov Chain stored in Neo4j, produced by
* {@link NGramDatabasePopulator}.
*/
public class RandomSentenceCreator {
private final Random random = new Random(System.currentTimeMillis());
/**
* Create a random sentence from the underlying n-gram model. Starts at a random node an follows random outgoing
* relationships of type {@link Constants#REL} with a probability proportional to that transition occurrence in the
* text that was processed to form the model. This happens until the desired length is achieved. In case a node with
* no outgoing relationships it reached, the walk is re-started from a random node.
*
* @param database storing the n-gram model.
* @param length desired number of characters in the random sentence.
* @return random sentence.
*/
public String createRandomSentence(GraphDatabaseService database, int length) {
Node startNode = randomNode(database);
return walk(startNode, length, 0);
}
private String walk(Node startNode, int maxLength, int currentLength) {
if (currentLength >= maxLength) {
return (String) startNode.getProperty(NAME);
}
int totalRelationships = (int) startNode.getProperty(TOTAL, 0);
if (totalRelationships == 0) {
//terminal node, restart from random
return walk(randomNode(startNode.getGraphDatabase()), maxLength, currentLength);
}
int choice = random.nextInt(totalRelationships) + 1;
int total = 0;
Iterator<Relationship> relationshipIterator = startNode.getRelationships(OUTGOING, REL).iterator();
Relationship toFollow = null;
while (total < choice && relationshipIterator.hasNext()) {
toFollow = relationshipIterator.next();
total += (int) toFollow.getProperty(PROBABILITY);
}
Node nextNode;
if (toFollow == null) {
//no relationship to follow => stay on the same node and try again
nextNode = startNode;
} else {
nextNode = toFollow.getEndNode();
}
return ((String) nextNode.getProperty(NAME)).substring(0, 1) + walk(nextNode, maxLength, currentLength + 1);
}
private Node randomNode(GraphDatabaseService database) {
return random(GlobalGraphOperations.at(database).getAllNodes());
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With