Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialize prefix tree

I get a ProtoException ("Possible recursion detected (offset: 4 level(s)): o EOW") when serializing a tree structure like so:

var tree = new PrefixTree();
        tree.Add("racket".ToCharArray());
        tree.Add("rambo".ToCharArray());
        using (var stream = File.Open("test.prefix", FileMode.Create))
        {
            Serializer.Serialize(stream, tree);
        }

The tree implementation:

[ProtoContract]
public class PrefixTree
{
    public PrefixTree()
    {
        _nodes = new Dictionary<char, PrefixTree>();
    }

    public PrefixTree(char[] chars, PrefixTree parent)
    {
        if (chars == null) throw new ArgumentNullException("chars");
        if (parent == null) throw new ArgumentNullException("parent");
        if (chars.Length == 0) throw new ArgumentException();

        _parent = parent;
        _nodes = new Dictionary<char, PrefixTree>();
        _value = chars[0];

        var overflow = chars.SubSet(1);
        if (!overflow.Any()) _endOfWord = true;
        else Add(overflow.ToArray());
    }

    [ProtoMember(1)]
    private readonly char _value;
    [ProtoMember(2)]
    private readonly bool _endOfWord;
    [ProtoMember(3)]
    private readonly IDictionary<char, PrefixTree> _nodes;
    [ProtoMember(4, AsReference = true)]
    private readonly PrefixTree _parent;

    public void Add(char[] word)
    {
        if (word == null) throw new ArgumentNullException("word");
        if (word.Length == 0) return;

        var character = word[0];
        PrefixTree node;
        if (_nodes.TryGetValue(character, out node))
        {
            node.Add(word.SubSet(1));
        }
        else
        {
            node = new PrefixTree(word, this);
            _nodes.Add(character, node);
        }
    }

    public override string ToString()
    {
        return _endOfWord ? _value + " EOW" : _value.ToString();
    }
}

public static class ListHelper
{
    public static char[] SubSet(this char[] source, int start)
    {
        return source.SubSet(start, source.Length - start);
    }

    public static char[] SubSet(this char[] source, int start, int length)
    {
        if (start < 0) throw new ArgumentOutOfRangeException();
        if (start > source.Length) throw new ArgumentOutOfRangeException();
        if (length < 0) throw new ArgumentOutOfRangeException();

        var result = new char[length];
        Array.Copy(source, start, result, 0, length);
        return result;
    }
}

Am I decorating with the wrong attributes or have I simply designed a non-serializable tree?

Edit: tried this to no avail:

var typeModel = RuntimeTypeModel.Default;
        var type = typeModel.Add(typeof(PrefixTree), false);
        type.AsReferenceDefault = true;
        type.Add("_value", "_endOfWord", "_nodes", "_parent");

        var tree = new PrefixTree();
        tree.Add("racket".ToCharArray());
        tree.Add("rambo".ToCharArray());
        using (var stream = File.Open("test.prefix", FileMode.Create))
        {
            typeModel.Serialize(stream, tree);
        }
like image 981
Marcus Avatar asked May 18 '12 07:05

Marcus


1 Answers

The _parent and Value of the _nodes both point to the same type ( PrefixTree ), but only the _parent is marked as "AsReference".

If you walk the serialization stack you will see that the Value of the Dictionary value is serialized independently of the _parent item and is not checked for a duplicate instance.

As it walks the tree there is an internal serialization depth check of 25, at which it starts detecting the duplicate instances. If this value was bigger it wouldn't throw an exception, if it was smaller it would throw on a node higher up the tree.

I also don't think this would be deserializable, and certainly if it did that the _parent field's value of each child node won't be the same instance as the _nodes container.

You need to create you're own dictionary type ( subclass Dictionary<,> or implement IDictionary<,> ) so you can added the [ProtoContract] attribute and control the serialization of dictionary's items.

ie

[ProtoContract]
public class NodeItem
{
    [ProtoMember(1)]
    public char Key { get; set; }
    [ProtoMember(2, AsReference = true)]
    public PrefixTree Value { get; set; }
}

[ProtoContract]
public class Nodes : IDictionary<char, PrefixTree>
{
    private readonly IDictionary<char, PrefixTree> inner;

    [ProtoMember(1)]
    public NodeItem[] Items
    {
        get
        {
            return this.inner.Select(item => new NodeItem() {Key = item.Key, Value = item.Value}).ToArray();
        }
        set
        {
            foreach( NodeItem item in value)
            {
                this.inner.Add(item.Key, item.Value);
            }
        }
    }
    ... // Omitted IDictionary members for clarity

the key here is to get the AsReference metadata attached to the nodes's PrefixTree. Also note that Items is returning an Array, if you want it as a list, then you need to use set the OverwriteList attribute member.

I also needed to remove the readonly keyword for each field in the PrefixTree type. This unit test passed for me.

        [TestMethod]
    public void TestMethod1()
    {
        var tree = new PrefixTree();
        tree.Add("racket".ToCharArray());
        tree.Add("rambo".ToCharArray());

        PrefixTree tree2 = null;

        using (var stream = new MemoryStream())
        {
            Serializer.Serialize(stream, tree);
            stream.Position = 0;
            tree2 = Serializer.Deserialize<PrefixTree>(stream);
        }


        Assert.IsNotNull(tree2);
        Assert.AreEqual(tree._nodes.Count, tree2._nodes.Count);
        Assert.AreEqual(2, tree2._nodes['r']._nodes['a']._nodes.Count);      // 'c' and 'm'
        Assert.AreEqual('c', tree2._nodes['r']._nodes['a']._nodes.Values.First().Value);
        Assert.AreEqual('m', tree2._nodes['r']._nodes['a']._nodes.Values.Last().Value);
    }
like image 59
Robert Slaney Avatar answered Oct 23 '22 17:10

Robert Slaney