Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Joda Time serialized form so large, and what to do about it?

On my machine, the following code snippet:

DateTime now = DateTime.now();
System.out.println(now);
System.out.println("Date size:\t\t"+serialiseToArray(now).length);
System.out.println("DateString size:\t"+serialiseToArray(now.toString()).length);
System.out.println("java.util.Date size:\t"+serialiseToArray(new Date()).length);
Duration twoHours = Duration.standardHours(2);
System.out.println(twoHours);
System.out.println("Duration size:\t\t"+serialiseToArray(twoHours).length);
System.out.println("DurationString size:\t"+serialiseToArray(twoHours.toString()).length);

Gives the following output:

2013-09-09T15:07:44.642+01:00
Date size:      273
DateString size:    36
java.util.Date size:    46
PT7200S
Duration size:      107
DurationString size:    14

As you can see, the org.joda.time.DateTime object is more than 5 times larger than its String form, which seems to describe it perfectly, and the java.util.Date equivalent. The Duration object representing 2 hours is also much larger than I would expect, as looking at the source it seems like its only member variable is a single long value.

Why are these serialized objects so large? And is there any pre-existing solution for getting a smaller representation?

The serialiseToArray method, for reference:

private static byte[] serialiseToArray(Serializable s)
{
    try
    {
        ByteArrayOutputStream byteArrayBuffer = new ByteArrayOutputStream();
        new ObjectOutputStream(byteArrayBuffer).writeObject(s);
        return byteArrayBuffer.toByteArray();
    }
    catch (IOException ex)
    {
        throw new RuntimeException(ex);
    }
}
like image 333
MikeFHay Avatar asked Feb 15 '23 08:02

MikeFHay


2 Answers

Serializing has some overhead. In this instance the overhead that you notice the most is that the class structure is described in the actual output. And since Duration has a base class (BaseDuration) and two interfaces (ReadableDuration and Serializable), that overhead becomes slightly larger than the one of Date (which has no base class and just a single interface).

Those classes are referenced using their fully-qualified class names in the serialized file and as such create quite some bytes.

Good news: that overhead is only paid once per output stream. If you serialize another Duration object, the difference in size should be rather small.

I've used the jdeserialize project to look in the result of serializing a java.util.Date vs. a Duration (note that this tool does not need access to the .class files, so all information it dumps is actually contained in the serialized data):

The result for java.util.Date:

read: java.util.Date _h0x7e0001 = r_0x7e0000;
//// BEGIN stream content output
java.util.Date _h0x7e0001 = r_0x7e0000;
//// END stream content output

//// BEGIN class declarations (excluding array classes)
class java.util.Date implements java.io.Serializable {
}

//// END class declarations

//// BEGIN instance dump
[instance 0x7e0001: 0x7e0000/java.util.Date
  object annotations:
    java.util.Date
        [blockdata 0x00: 8 bytes]

  field data:
    0x7e0000/java.util.Date:
]
//// END instance dump

The result for Duration:

read: org.joda.time.Duration _h0x7e0002 = r_0x7e0000;
//// BEGIN stream content output
org.joda.time.Duration _h0x7e0002 = r_0x7e0000;
//// END stream content output

//// BEGIN class declarations (excluding array classes)
class org.joda.time.Duration extends org.joda.time.base.BaseDuration implements java.io.Serializable {
}

class org.joda.time.base.BaseDuration implements java.io.Serializable {
    long iMillis;
}

//// END class declarations

//// BEGIN instance dump
[instance 0x7e0002: 0x7e0000/org.joda.time.Duration
  field data:
    0x7e0001/org.joda.time.base.BaseDuration:
        iMillis: 0
    0x7e0000/org.joda.time.Duration:
]
//// END instance dump

Note that the "class declaration" block is quite a bit longer for Duration. This also explains why serializing a single Duration takes 107 bytes, but serializing two (distinct) Duration objects takes only 121 bytes.

like image 94
Joachim Sauer Avatar answered Mar 23 '23 00:03

Joachim Sauer


From the source:

Internally, the class holds two pieces of data. Firstly, it holds the datetime as milliseconds from the Java epoch of 1970-01-01T00:00:00Z. Secondly, it holds a Chronology which determines how the millisecond instant value is converted into the date time fields. The default Chronology is org.joda.time.chrono.ISOChronology which is the agreed international standard and compatible with the modern Gregorian calendar.

The ISOChronology derives from AssembledChronology, most of which (but not all) is declared as transient fields.

like image 35
Brian Agnew Avatar answered Mar 22 '23 23:03

Brian Agnew