In my Dataflow pipeline, I'm setting the field impressions_raw
as a Long
in a com.google.api.services.bigquery.model.TableRow
object:
Further on in my pipeline, I read the TableRow
back out. But instead of a Long
, I get back an Integer
.
However, if I explicitly set the value to be a Long
value greater than Integer.MAX_VALUE
, for example 3 billion, then I get back a Long
!
Is seems that the Dataflow SDK is doing some sort of type check optimization under the hood.
So, without doing ugly type checking, how should one programatically deal with this? (maybe I missed something obvious)
Thanks for the report. Unfortunately, this problem is fundamental with the use of TableRow
. We strongly recommend solution 1 below: convert away from TableRow
as soon as practical in your pipeline.
The TableRow
object in which you are storing these values is serialized and deserialized by Jackson, inside of TableRowJsonCoder
. Jackson has exactly the behavior you're describing -- that is, for this class:
class MyClass {
Object v;
}
it will serialize an instance with v = Long.valueOf(<number>)
as {v: 30}
or {v: 3000000000}
. On deserializing, however, it will determine the type of the object using the number of bits needed to represent the answer. See this SO post.
Two possible solutions come to mind, with solution 1 strongly recommended:
Do not use TableRow
as an intermediate value. In other words, convert to POJO as soon as possible. The key reason this type-mixup happens is that TableRow
is essentially a Map<String, Object>
and Jackson (or other coders) cannot know that you want a Long
back. With a POJO, the types would be clear.
The other advantage of switching off of TableRow
is to get to an efficient coder, say, AvroCoder
. Because TableRow
s are encoded and decoded to/from JSON, the encoding is both verbose and slow -- shuffling TableRow
will be both CPU- and I/O-intensive. I expect you'll see much better performance with Avro-coded POJOs than if you're passing TableRow
objects around.
For an example, see LaneInfo
in TrafficMaxLaneFlow
.
Write code that can handle both:
long numberToLong(@Nonnull Number n) {
return n.longValue();
}
long x = numberToLong((Number) row.get("field"));
Long numberToLong(@Nonnull Number n) {
if (n instanceof Long) {
// avoid a copy
return n;
}
return Long.valueOf(n.longValue());
}
Long x = numberToLong((Number) row.get("field"));
You may need additional checks to the second variant if n
may be null
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With