I am trying to represent a group of sentences as a directed graph where one word is represented by one node. If a word is repeated then the node is not repeated, the previously existing node is used. Let's call this graph MainG.
Following this, I take a new sentence, creating a directed graph of this sentence (call this graph SubG) and then looking for the Maximum Common Subgraph of SubG in MainG.
I am using NetworkX api in Python 3.5. I understand that as this is NP-Complete problem for normal graphs, but for Directed Graphs it is a Linear problem. One of the links I referred:
How can I find Maximum Common Subgraph of two graphs?
I tried to do the following code:
import networkx as nx
import pandas as pd
import nltk
class GraphTraversal:
    def createGraph(self, sentences):
        DG=nx.DiGraph()
        tokens = nltk.word_tokenize(sentences)
        token_count = len(tokens)
        for i in range(token_count):
            if i == 0:
                continue
            DG.add_edges_from([(tokens[i-1], tokens[i])], weight=1)
        return DG
    def getMCS(self, G_source, G_new):
        """
        Creator: Bonson
        Return the MCS of the G_new graph that is present 
        in the G_source graph
        """
        order =  nx.topological_sort(G_new)
        print("##### topological sort #####")
        print(order)
        objSubGraph = nx.DiGraph()
        for i in range(len(order)-1):
            if G_source.nodes().__contains__(order[i]) and G_source.nodes().__contains__(order[i+1]):
                print("Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                objSubGraph.add_node(order[i])
                objSubGraph.add_node(order[i+1])
                objSubGraph.add_edge(order[i], order[i+1])
            else:
                print("Does Not Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                continue
obj_graph_traversal = GraphTraversal()
SourceSentences = "A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ."
SourceGraph = obj_graph_traversal.createGraph(SourceSentences)
TestSentence_1 = "not much of a story"    #ThisWorks
TestSentence_1 = "not much of a story of what is good"    #This DOES NOT Work
TestGraph = obj_graph_traversal.createGraph(TestSentence_1)
obj_graph_traversal.getMCS(SourceGraph, TestGraph)
As I am trying to do a topological sort, the second one doesn't work.
Would be interested in understanding the possible approaches to this.
The following code gets the maximum common subgraph from a directed graph:
def getMCS(self, G_source, G_new):
    matching_graph=nx.Graph()
    for n1,n2,attr in G_new.edges(data=True):
        if G_source.has_edge(n1,n2) :
            matching_graph.add_edge(n1,n2,weight=1)
    graphs = list(nx.connected_component_subgraphs(matching_graph))
    mcs_length = 0
    mcs_graph = nx.Graph()
    for i, graph in enumerate(graphs):
        if len(graph.nodes()) > mcs_length:
            mcs_length = len(graph.nodes())
            mcs_graph = graph
    return mcs_graph
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With