Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can serializing the same object produce different streams?

Is there any situation when serializing the same object could produce different streams (assuming one of the formatters built-in .NET is used for both serializations)?

This came up in the discussion below this post. The claim was made that this can happen, yet no concrete explanation was offered, so I was wondering if anyone can shed some light on the issue?

like image 841
Branko Dimitrijevic Avatar asked Oct 31 '11 13:10

Branko Dimitrijevic


People also ask

What is purpose of serialization of an object?

Serialization in Java is the concept of representing an object's state as a byte stream. The byte stream has all the information about the object. Usually used in Hibernate, JMS, JPA, and EJB, serialization in Java helps transport the code from one JVM to another and then de-serialize it there.

What is the benefit of serialization?

Serialization allows us to transfer objects through a network by converting it into a byte stream. It also helps in preserving the state of the object. Deserialization requires less time to create an object than an actual object created from a class.

Can a serialized object be transferred over a network?

Using serialization, an object can be transferred across domains through firewalls, as well as be used for different languages and platforms. The formats of serialized objects are standardized so as to be able to be read by different platforms, if needed.

What are the different ways of data serialization?

XML , JSON , BSON, YAML , MessagePack, and protobuf are some commonly used data serialization formats.


1 Answers

As I explained in the comment of that SO question, the issue is caused (at least the case I have discovered) by optimisation of the string output. It seems if strings are the same reference, then it will output it once.

So what we the sample code does it to use a long string for properties of an object and change the reference of one string and then serialise. Then deserialise the stream back again to object (and this time since the string is interned, same reference is used) and then serialise again. This time the stream is smaller.

OK, here is the proof code:

[Serializable]
public class Proof
{
    public string S1 { get; set; }
    public string S2 { get; set; }
    public string S3 { get; set; }
}

class Program
{
    static void Main(string[] args)
    {

        const string LongString =
            "A value that is going to change the world nad iasjdsioajdsadj sai sioadj sioadj siopajsa iopsja iosadio jsadiojasd ";

        var proof = new Proof() {
            S1 = LongString,
            S2 = LongString,
            S3 = LongString
        };

        proof.S2 = LongString.Substring(0, 10) + LongString.Substring(10); // just add up first 10 character with the rest. 
               //This just makes sure reference is not the same although values will be

        Console.WriteLine(proof.S1 == proof.S2);
        Console.WriteLine(proof.S1 == proof.S3);
        Console.WriteLine(proof.S2 == proof.S3);
        Console.WriteLine("So the values are all the same...");

        BinaryFormatter bf = new BinaryFormatter();
        MemoryStream stream = new MemoryStream();
        bf.Serialize(stream, proof);
        byte[] buffer = stream.ToArray();
        Console.WriteLine("buffer length is " + buffer.Length); // outputs 449 on my machine
        stream.Position = 0;
        var deserProof = (Proof) bf.Deserialize(new MemoryStream(buffer));
        deserProof.S1 = deserProof.S2;
        deserProof.S3 = deserProof.S2;
        MemoryStream stream2 = new MemoryStream();
        new BinaryFormatter().Serialize(stream2, deserProof);

        Console.WriteLine("buffer length now is " + stream2.ToArray().Length); // outputs 333 on my machine!!
        Console.WriteLine("What? I cannot believe my eyes! Someone help me ........");

        Console.Read();
    }
like image 144
Aliostad Avatar answered Nov 01 '22 17:11

Aliostad