Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easiest way to convert a TableRow to JSON-formatted String, in dataflow 2.x?

Short of writing my own function to do it, what is the easiest way to convert a TableRow object, inside a dataflow 2.x pipeline, to a JSON-formatted String?

I thought the code below would work, but it isn't correctly inserting quotes in between key/values, especially where there are nested fields.

public static class TableRowToString extends DoFn<TableRow, String> {    
  private static final long serialVersionUID = 1L;

  @ProcessElement
    public void processElement(ProcessContext c) {
      c.output(c.element().toString());
    }
  }
}
like image 333
Max Avatar asked Mar 08 '23 05:03

Max


2 Answers

Use GSON and do a gson.toJson(yourTableRow) details here

like image 168
PUG Avatar answered Mar 10 '23 13:03

PUG


I ran into the same problem, I solved by using org.apache.beam.sdk.extensions.jackson.AsJsons.

To use it, it is not necessary to create a new transform, you can apply it directly on the pipeline.

import org.apache.beam.sdk.extensions.jackson.AsJsons;

Pipeline p = Pipeline.create(options);

p.apply("The transform that returns a PCollection of TableRow")
.apply("JSon Transform", AsJsons.of(TableRow.class));

And if you are managing your project with maven, you can add this to the <dependencies> in the pom.xml file

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-extensions-json-jackson</artifactId>
  <version>2.5.0</version>
  <scope>compile</scope>
</dependency>
like image 35
Rafael Alves Avatar answered Mar 10 '23 11:03

Rafael Alves