Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL - Encoders for Tuple Containing a List or Array as an Element

Using Spark 2.2 + Java 1.8

I have two custom data types "Foo" and "Bar". Each one implements serializable.'Foo' has a one to many relationship with 'Bar' so their relationship is represented as a Tuple:

Tuple2<Foo, List<Bar>>

Typically, when I have a 1:1 relationship, I can encode to my custom types like so:

Encoder<Tuple2<Foo,Bar>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class),Encoders.bean(Bar.class));

and then use to encode my Dataset

Dataset<Tuple2<Foo,Bar>> fooBarSet = getSomeData().as(fooBarEncoder);

But I am having trouble finding a way to encode for the scenario when I have a list (or an array) as a Tuple2 element. What I would like to be able to do is to provide an encoder for the second element like this:

Encoder<Tuple2<Foo,List<Bar>>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class), List<Bar>.class);

and then encode to my dataset:

Dataset<Tuple2<Foo,List<Bar>>> fooBarSet = getSomeData().as(fooBarEncoder)

But obviously I cannot invoke .class on a parameterized type like List

I know that for String and primitive types, arrays are supported by spark implicits e.g.:

sparkSession.implicits().newStringArrayEncoder()

But how would I create an encoder for a List or Array of a custom class type?

like image 411
HansGruber Avatar asked May 02 '18 01:05

HansGruber


1 Answers

I'm not sure how well this method could be implemented within your setup but here goes. Create a wrapper class for your list and try it out.

public class BarList implements Serializable {
    List<Bar> list;

    public List<Bar> getList() {
        return list;
    }
    public void setList(List<Bar> l) {
        list = l;
    }
}
like image 56
CHEESEBOT314 Avatar answered Oct 26 '22 09:10

CHEESEBOT314