Short of writing my own function to do it, what is the easiest way to convert a TableRow
object, inside a dataflow 2.x pipeline, to a JSON-formatted String?
I thought the code below would work, but it isn't correctly inserting quotes in between key/values, especially where there are nested fields.
public static class TableRowToString extends DoFn<TableRow, String> {
private static final long serialVersionUID = 1L;
@ProcessElement
public void processElement(ProcessContext c) {
c.output(c.element().toString());
}
}
}
Use GSON
and do a gson.toJson(yourTableRow)
details here
I ran into the same problem, I solved by using org.apache.beam.sdk.extensions.jackson.AsJsons.
To use it, it is not necessary to create a new transform, you can apply it directly on the pipeline.
import org.apache.beam.sdk.extensions.jackson.AsJsons;
Pipeline p = Pipeline.create(options);
p.apply("The transform that returns a PCollection of TableRow")
.apply("JSon Transform", AsJsons.of(TableRow.class));
And if you are managing your project with maven, you can add this to the <dependencies>
in the pom.xml
file
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-json-jackson</artifactId>
<version>2.5.0</version>
<scope>compile</scope>
</dependency>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With