Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check similarity of two Xml trees (Tree Edit Distance in C#)

In a C# application I need to check the output of my algorithm, which is an XML tree against another XML tree to see how they are similar. (node order is important, but the structure (nested nodes), names of nodes are more important). Maybe the number of adds, removes and moves that occurs in some "Tree Edit distance" algorithms be a good indicator. But the answers are more Java or Python packages.

So, I tried to use XMLDiffPatch, it works good when the algorithm type is set to Precise. However its bad point is that it just generate a DiffGram file that needs to be analyzed to find the number of operations. Moreover, it is very buggy and generates OutOfRangeException for some XML trees. I also couldn't find better packages for my purpose for .Net. There are some Xml difference packages but maybe none or few of them are on Tree Edit Distance.

An example:

<A>
  <B>
    <C></C>
    <D></D>
    <E>
       <F>
       </F>
    </E>
  </B>
</A>

To:

<A>      
    <C></C>
    <D></D>
    <G></G>
</A>

To convert the first Xml to the second, you need to remove E and F (2 costs) then you need to remove B (but not its sub tree) and to add G. Then the total costs is 4.

So, as I know here I shouldn't ask for packages and tools, I ask for a simple algorithm or (tree edit distance algorithm in .Net) to do that. This is my own algorithm to check similarity and ignore minor difference (Having one or few nested nodes) but it is very primary and just for an starting point:

public int XMLCompare(XmlNode primary, XmlNode secondary)
{
    int x = 0;
    if (secondary == null || primary == null)
        return 1;

    if (secondary.ChildNodes.Count == 1 && primary.ChildNodes.Count > 1)
    {
        x += XMLCompare(primary, secondary.ChildNodes[0]);
    }
    else if (secondary.ChildNodes.Count > 1 && primary.ChildNodes.Count == 1)
    {
        x += XMLCompare(primary.ChildNodes[0], secondary);
    }
    else
    {
        if (primary.Name.ToLower() != secondary.Name.ToLower())
            x = 1;
        int m = Math.Max(primary.ChildNodes.Count, secondary.ChildNodes.Count);
        for (int i = 0; i < m  i++)
        {
            x += XMLCompare(
            i < primary.ChildNodes.Count ? primary.ChildNodes[i] : null,  
            i < secondary.ChildNodes.Count ? secondary.ChildNodes[i] : null);

        }
    }

    return x;
}
like image 501
Ahmad Avatar asked Feb 14 '16 14:02

Ahmad


1 Answers

Microsoft has an API for that. check this. This may be old dll reference but just for you information, you need to use something like this. C:\Windows\assembly\GAC\XmlDiffPatch\1.0.8.28__b03f5f7f11d50a3a\XmlDiffPatch.dll

like image 152
DOTNET Team - WeblineIndia Avatar answered Nov 17 '22 15:11

DOTNET Team - WeblineIndia