I am curious to understand the best practices for encoding two very specific types of data within Avro: Timestamps and IP Addresses.
I came across the open JIRA ticket for Timestamps (https://issues.apache.org/jira/browse/AVRO-739), but it looks like the topic has been quiet for some time. So - What are the best practices for encoding Timestamps in Avro (preferably for downstream use in a MapReduce, Pig, Hive, Streaming context).
Furthermore, I would be interested to hear what other people are doing to encode IP Addresses into Avro.
I have some experience with encoding of types in Avro. In my case a big requirement is accessing the data through Hive.
For timestamps I would recommend using a float with unix timestamps. This is supported by most other libraries and works easy with Hive since you can cast to timestamp.
For IP Addresses I would use a string encoding. I think the readability of strings when using the data makes it the best type to go for. If you have other requirements, such as keeping down the data size, maybe a binary encoding might be better for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With