Using Spark 2.2 + Java 1.8
I have two custom data types "Foo" and "Bar". Each one implements serializable.'Foo' has a one to many relationship with 'Bar' so their relationship is represented as a Tuple:
Tuple2<Foo, List<Bar>>
Typically, when I have a 1:1 relationship, I can encode to my custom types like so:
Encoder<Tuple2<Foo,Bar>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class),Encoders.bean(Bar.class));
and then use to encode my Dataset
Dataset<Tuple2<Foo,Bar>> fooBarSet = getSomeData().as(fooBarEncoder);
But I am having trouble finding a way to encode for the scenario when I have a list (or an array) as a Tuple2 element. What I would like to be able to do is to provide an encoder for the second element like this:
Encoder<Tuple2<Foo,List<Bar>>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class), List<Bar>.class);
and then encode to my dataset:
Dataset<Tuple2<Foo,List<Bar>>> fooBarSet = getSomeData().as(fooBarEncoder)
But obviously I cannot invoke .class on a parameterized type like List
I know that for String and primitive types, arrays are supported by spark implicits e.g.:
sparkSession.implicits().newStringArrayEncoder()
But how would I create an encoder for a List or Array of a custom class type?
I'm not sure how well this method could be implemented within your setup but here goes. Create a wrapper class for your list and try it out.
public class BarList implements Serializable {
List<Bar> list;
public List<Bar> getList() {
return list;
}
public void setList(List<Bar> l) {
list = l;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With