Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an API implementation of Avro's "duration" logical type?

The current Apache Avro (1.8.2) documentation mentions a "duration" logical type:

A duration logical type annotates Avro fixed type of size 12, which stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds.

While this all makes sense, I can't find an actual implementation in either the .Net or Java libraries. The documentation for logical types clearly lists every logical type except duration (date, time-millis, time-micros, timestamp-millis and timestamp-micros).

The "duration" is defined in my Avro schema accordingly:

{
    "type": "record",
    "name": "DataBlock",
    "fields": [
    {
        "name": "duration",
        "type": {
            "type": "fixed",
            "name": "DataBlockDuration",
            "size": 12
        }
    }]
}

In .Net (excuse the VB), I have to manually serialise durations:

Dim ret(11) As Byte
Dim months = BitConverter.GetBytes(duration.Months)
Dim days = BitConverter.GetBytes(duration.Days)
Dim milliseconds = BitConverter.GetBytes(duration.Milliseconds)

Array.Copy(months, 0, ret, 0, 4)
Array.Copy(days, 0, ret, 4, 4)
Array.Copy(milliseconds, 0, ret, 8, 4)

When deserialising in Java, I have to convert to org.joda.time.Period by doing this:

IntBuffer buf = ByteBuffer
                  .wrap(dataBlock.getDuration().bytes())
                  .order(ByteOrder.LITTLE_ENDIAN)
                  .asIntBuffer();

Period period = Period
                  .months(buf.get(0))
                  .withDays(buf.get(1))
                  .withMillis(buf.get(2));

Am I missing something, or did the Avro team write a spec and forget to implement it? It seems that this data type in particular has to be implemented without any help from the Avro API at all.

like image 637
Russell Phillips Avatar asked Apr 24 '18 01:04

Russell Phillips


People also ask

What is logical type in Avro schema?

Logical types specify a way of representing a high-level type as a base Avro type. For example, a date is specified as the number of days after the unix epoch (or before using a negative value). This enables extentions to Avro's type system without breaking binary compatibility.

Does Avro support date data type?

The date logical type annotates the Avro integer primitive type. The integer type stores the number of days since midnight January 1, 1970 UTC. Load values using the date logical type into target columns using the following Vertica data types: DATE.

Does Avro support timestamp?

Avro in HDF is 1.7. 7 and timestamp was only introduced in Avro 1.8. x. I would suggest to treat the timestamp field as string.

What is Avro format?

What is Avro? Avro is an open source data serialization system that helps with data exchange between systems, programming languages, and processing frameworks. Avro helps define a binary format for your data, as well as map it to the programming language of your choice.


2 Answers

Joda-Time

The Joda-Time project is now in maintenance mode, with the team advising migration to the java.time classes. Concepts are similar, as both projects were led by the same man, Stephen Colebourne.

java.time

The java.time framework offers two separate classes to represent a span of time unattached to the timeline:

  • Period
    A number of years, months, and days.
  • Duration
    A number of days (generic 24-hour chunks of time unrelated to the calendar), hours, minutes, seconds, and a fractional second (nanoseconds).

You could use your first two numbers as a Period, and the third number for a Duration.

Period p = Period.ofMonths( months ).plusDays( days ) ;
Duration d = Duration.ofMillis( millis ) ;

You might want to normalize the years & months of the Period object. For example, a period of "15 months" will be normalized to "1 year and 3 months".

Period p = Period.ofMonths( months ).plusDays( days ).normalized() ;

ISO 8601

The java.time classes use standard ISO 8601 standard formats when parsing/generating strings.

For a period or duration, that means using the PnYnMnDTnHnMnS format. The P marks the beginning, and the T separates any years-months-days from any hours-minutes-seconds. For example, "P3Y6M4DT12H30M5S" represents a duration of "three years, six months, four days, twelve hours, thirty minutes, and five seconds".

To generate such a string, simply call toString on a Period or Duration. To parse, call parse.

Odd concepts in Avro

That Avro concept of duration (months + days + milliseconds) seems quite odd to me. The biggest problem is that mixing years-months-days with hours-minutes-seconds rarely makes any practical sense (think about it). And tracking months but not years is surprising.

org.threeten.extra.PeriodDuration

If you insist on wanting to merge the years-months-days with hours-minutes-seconds, consider adding the ThreeTen-Extra library to your project. It offers a PeriodDuration class.

PeriodDuration pd = PeriodDuration.of( p , d ) ;  // Pass `Period` and `Duration` objects as covered above.

Again, you will likely want to call normalizedStandardDays and normalizedYears.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

  • Java SE 8, Java SE 9, Java SE 10, and later
    • Built-in.
    • Part of the standard Java API with a bundled implementation.
    • Java 9 adds some minor features and fixes.
  • Java SE 6 and Java SE 7
    • Much of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
  • Android
    • Later versions of Android bundle implementations of the java.time classes.
    • For earlier Android (<26), the ThreeTenABP project adapts ThreeTen-Backport (mentioned above). See How to use ThreeTenABP….

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

like image 111
Basil Bourque Avatar answered Oct 20 '22 21:10

Basil Bourque


According to the Apache issue tracker AVRO-2123, the logical duration type has been specified but not yet implemented.

So yes, the Apache team has written the spec but forgotten to implement it in this detail.

I have also searched the unzipped jar-file in the Avro-version 1.8.2 for any import of joda-library and only found the class org.apache.avro.data.TimeConversions which obtains some conversions for other logical types like "date" (mapped to org.joda.time.LocalDate) etc. but not for the Joda-class Period.

It seems your way to circumvent the problem by using the Period-class of Joda is good because:

  • Avro still uses Joda-Time (although latter one is in maintenance mode),
  • the Period-class can completely map the Avro-spec for duration in months, days and milliseconds (and using unsigned ints as required by Avro spec for an always positive duration is also a good thing for avoiding odd periods with mixed signs).

Possible alternatives for Joda-Time which I am aware of:

  • Threeten-Extra-class PeriodDuration (see the answer of Basil Bourque)
  • Time4J-class net.time4j.Duration (my lib)

The Threeten-Extra-class has less features (no localization at all, reduced ISO-8601-compliance etc) than the Joda-class but might still be enough for you in your special Avro-related scenario while the Time4J-class has even more features than Joda to offer (on the areas of ISO-compliance, formatting, parsing, normalizing etc).

like image 45
Meno Hochschild Avatar answered Oct 20 '22 22:10

Meno Hochschild