Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for protobuf-net, versioning and surrogate types

I'm trying to determine how to address this use case using protobuf-net (Marc Gravell's implementation).

  • We have class A, which is considered version 1
  • An instance of class A has been serialized to disk
  • We now have class B, which is considered version 2 of class A (there were so many things wrong with class A, we had to create class B for the next version). Class A still exists in code, but only for legacy purposes.
  • I want to deserialize the version:1 data (stored to disk) as a class B instance, and use a logic routine to translate the data from the previous class A instance to a new instance of class B.
  • The instance of class B will be serialized to disk during operation.
  • The application should expect to deserialize instances of both class A and B.

The concept of data contract surrogates and the DataContractSerializer come to mind. The goal is transition the version:1 data to the new class B structure.

An example:

[DataContract]
public class A {

     public A(){}

     [DataMember]
     public bool IsActive {get;set;]

     [DataMember]
     public int VersionNumber {
          get { return 1; }
          set { }
     }

     [DataMember]
     public int TimeInSeconds {get;set;}

     [DataMember]
     public string Name {get;set;}

     [DataMember]
     public CustomObject CustomObj {get;set;} //Also a DataContract

     [DataMember]
     public List<ComplexThing> ComplexThings {get;set;} //Also a DataContract
     ...
}

[DataContract]
public class B {

     public B(A a) {
          this.Enabled = a.IsActive; //Property now has a different name
          this.TimeInMilliseconds = a.TimeInSeconds * 1000; //Property requires math for correctness
          this.Name = a.Name;
          this.CustomObject2 = new CustomObject2(a.CustomObj); //Reference objects change, too
          this.ComplexThings = new List<ComplexThings>();
          this.ComplexThings.AddRange(a.ComplexThings);
          ...
     }

     public B(){}

     [DataMember]
     public bool Enabled {get;set;]

     [DataMember]
     public int Version {
          get { return 2; }
          set { }
     }

     [DataMember]
     public double TimeInMilliseconds {get;set;}

     [DataMember]
     public string Name {get;set;}

     [DataMember]
     public CustomObject2 CustomObject {get;set;} //Also a DataContract

     [DataMember]
     public List<ComplexThing> ComplexThings {get;set;} //Also a DataContract
     ...
}

Class A was the first iteration of our object, and is actively in use. Data exists in v1 format, using class A for serialization.

After realizing the error of our ways, we create a new structure called class B. There are so many changes between A and B that we feel it's better to create B, as opposed to adapting the original class A.

But our application already exists and class A is being used to serialize data. We're ready to roll our changes out to the world, but we must first deserialize data created under version 1 (using class A) and instantiate it as class B. The data is significant enough that we can't just assume defaults in class B for missing data, but rather we must transition the data from a class A instance to class B. Once we have a class B instance, the application will serialize that data again in class B format (version 2).

We're assuming we'll make modifications to class B in the future, and we want to be able to iterate to a version 3, perhaps in a new class "C". We have two goals: address data already in existence, and prepare our objects for future migration.

The existing "transition" attributes (OnSerializing/OnSerialized,OnDeserializing/OnDeserialized,etc.) don't provide access to the previous data.

What is the expected practice when using protobuf-net in this scenario?

like image 817
jro Avatar asked Jan 23 '13 23:01

jro


2 Answers

Right; looking at them you have indeed completely changed the contract. I know of no contract-based serializer that is going to love you for that, and protobuf-net is no different. If you already had a root node, you could do something like (in pseudo-code):

[Contract]
class Wrapper {
    [Member] public A A {get;set;}
    [Member] public B B {get;set;}
    [Member] public C C {get;set;}
}

and just pick whichever of A/B/C is non-null, perhaps adding some conversion operators between them. However, if you just have a naked A in the old data, this gets hard. There are two approaches I can think of:

  • add lots of shim properties for compatibility; not very maintainable, and I don't recommend it
  • sniff the Version as an initial step, and tell the serializer what to expect.

For example, you could do:

int version = -1;
using(var reader = new ProtoReader(inputStream)) {
    while(reader.ReadFieldHeader() > 0) {
        const int VERSION_FIELD_NUMBER = /* todo */;
        if(reader.FieldNumber == VERSION_FIELD_NUMBER) {
            version = reader.ReadInt32();
            // optional short-circuit; we're not expecting 2 Version numbers
            break;
        } else {
            reader.SkipField();
        }
    }
}
inputStream.Position = 0; // rewind before deserializing

Now you can use the serializer, telling it what version it was serialized as; either via the generic Serializer.Deserialize<T> API, or via a Type instance from the two non-generic APIs (Serializer.NonGeneric.Deserialize or RuntimeTypeModel.Default.Deserialize - either way, you get to the same place; it is really a case of whether generic or non-generic is most convenient).

Then you would need some conversion code between A / B / C - either via your own custom operators / methods, or by something like auto-mapper.

If you don't want any ProtoReader code kicking around, you could also do:

[DataContract]
class VersionStub {
    [DataMember(Order=VERSION_FIELD_NUMBER)]
    public int Version {get;set;}
}

and run Deserialize<VersionStub>, which will give you access to the Version, which you can then use to do the type-specific deserialize; the main difference here is that the ProtoReader code allows you to short-circuit as soon as you have a version-number.

like image 54
Marc Gravell Avatar answered Nov 05 '22 09:11

Marc Gravell


I don't have an expected practice, but this is what I'd do.

Given you still have access to your V1 class add a property on your V1 class that provides a V2 instance.

In your ProtoAfterDeserialization of V1 create an instance of V2 and seeing it's a Migration I'd suggest manually transfer across what you need (or if it's not too hard, try Merge YMMV).

Also in your ProtoBeforeSerialization throw some form of exception so that you don't attempt to write out the old one any more.

Edit: Examples of using these (VB code)

<ProtoBeforeSerialization()>
Private Sub BeforeSerialisaton()

End Sub

<ProtoAfterSerialization()>
Private Sub AfterSerialisaton()

End Sub

<ProtoBeforeDeserialization()>
Private Sub BeforeDeserialisation()

End Sub

<ProtoAfterDeserialization()>
Private Sub AfterDeserialisation()

End Sub

after seeing your example I hope this satisfied what you are trying to do. Class1 is how you load/convert.

using ProtoBuf;
using System.Collections.Generic;
using System.IO;

public class Class1
{
    public Class1()
    {
        using (FileStream fs = new FileStream("c:\\formatADataFile.dat",
               FileMode.Open, FileAccess.Read))
        {
            A oldA = Serializer.Deserialize<A>(fs);
            B newB = oldA.ConvertedToB;
        }
    }
}


[ProtoContract()]
public class B
{

    public B(A a)
    {
        //Property now has a different name
        this.Enabled = a.IsActive; 
        //Property requires math for correctness
        this.TimeInMilliseconds = a.TimeInSeconds * 1000; 
        this.Name = a.Name;
        //Reference objects change, too
        this.CustomObject2 = new CustomObject2(a.CustomObj); 
        this.ComplexThings = new List<ComplexThings>();
        this.ComplexThings.AddRange(a.ComplexThings);
        //...
    }

    public B() { }

    //[DataMember]
    [ProtoMember(1)]
    public bool Enabled { get; set; }

    //[DataMember]
    public int Version
    {
        get { return 2; }
        private set { }
    }

    [ProtoMember(2)]
    public double TimeInMilliseconds { get; set; }

    [ProtoMember(3)]
    public string Name { get; set; }

    [ProtoMember(4)]
    public CustomObject2 CustomObject { get; set; } //Also a DataContract

    [ProtoMember(5)]
    public List<ComplexThing> ComplexThings { get; set; } //Also a DataContract

    ///...
}

[ProtoContract()]
public class CustomObject2
{
    public CustomObject2()
    {
        Something = string.Empty;
    }

    [ProtoMember(1)]
    public string Something { get; set; }
}


[ProtoContract()]
public class A
{

    public A()
    {
        mBConvert = new B();
    }

    [ProtoMember(1)]
    public bool IsActive { get; set; }

    [ProtoMember(2)]
    public int VersionNumber
    {
        get { return 1; }
        private set { }
    }

    [ProtoBeforeSerialization()]
    private void NoMoreSavesForA()
    {
        throw new System.InvalidOperationException("Do Not Save A");
    }

    private B mBConvert;

    [ProtoAfterDeserialization()]
    private void TranslateToB()
    {
        mBConvert = new B(this);
    }

    public B ConvertedToB
    {
        get
        {
            return mBConvert;
        }
    }



    [ProtoMember(3)]
    public int TimeInSeconds { get; set; }

    [ProtoMember(4)]
    public string Name { get; set; }

    [ProtoMember(5)]
    public CustomObject CustomObj { get; set; } //Also a DataContract

    [ProtoMember(6)]
    public List<ComplexThing> ComplexThings { get; set; } //Also a DataContract
    //...
}

[ProtoContract()]
public class CustomObject
{
    public CustomObject()
    {

    }
    [ProtoMember(1)]
    public int Something { get; set; }
}

[ProtoContract()]
public class ComplexThing
{
    public ComplexThing()
    {

    }
    [ProtoMember(1)]
    public int SomeOtherThing { get; set; }
}
like image 20
Paul Farry Avatar answered Nov 05 '22 09:11

Paul Farry