Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print <String, Array[]> as a flat pair?

Setup:

I have data about customers and their favorite Top 10 TV shows. So far, I am able to get this data in a JavaRDD<Tuple2<String, Shows[]>>. I am able to print it and check if it is as expected, it is.

Objective:

Now, I need to print this data to a file, in the following format:

Customer_1 Fav_TV_Show_1
Customer_1 Fav_TV_Show_2
Customer_1 Fav_TV_Show_3
Customer_1 Fav_TV_Show_4
Customer_2 Fav_TV_Show_1
Customer_2 Fav_TV_Show_2
Customer_2 Fav_TV_Show_3
Customer_2 Fav_TV_Show_4
Customer_3 Fav_TV_Show_1
Customer_3 Fav_TV_Show_2
Customer_3 Fav_TV_Show_3
Customer_3 Fav_TV_Show_4

Problem:

I don't know how to do that. So far, I have tried this:

// Need a flat pair back
JavaPairRDD<String, Shows> resultPairs = result.mapToPair(
        new PairFunction<Tuple2<String,Shows[]>, String, Shows>() {
            public Tuple2<String, Shows> call(Tuple2<String, Shows[]> t) {

                // But this won't work as I have to return multiple <Customer - Show> pairs
                }
            });
}

Any help is much appreciated.

like image 841
Bhushan Avatar asked Mar 17 '23 16:03

Bhushan


1 Answers

Well, it's a bit weird that you got a JavaRDD<Tuple2<String, Shows[]>> instead of a JavaPairRDD<String, Shows[]> which is more comfortable to work with in the case of key-value pairs. Nonetheless, you can do as follows in order to flatten the result:

// convert your RDD into a PairRDD format
JavaPairRDD<String, Shows[]> pairs = result.mapToPair(new PairFunction<Tuple2<String,Shows[]>, String, Shows[]>() {
    public Tuple2<String, Shows[]> call(Tuple2<String, Shows[]> t) throws Exception {
        return t;
    }
});

// now flatMap the values in order to split them with their respective keys
JavaPairRDD<String, Shows> output = pairs.flatMapValues(
    new Function<Shows[], Iterable<Shows>>() {
        public Iterable<Shows> call(Shows[] shows) throws Exception {
            return Arrays.asList(shows);
        }
});

// do something else with them
output.foreach(new VoidFunction<Tuple2<String, Shows>>() {
    public void call(Tuple2<String, Shows> t) throws Exception {
        System.out.println(t._1() + " " + t._2());
    }
});

Alternatively, you can also obtain the output RDD by using flatMapToPair in one step, combining manually the array of Shows into an Iterable as follows:

JavaPairRDD<String, Shows> output = result.flatMapToPair(
    new PairFlatMapFunction<Tuple2<String, Shows[]>, String, Shows>() {
        public Iterable<Tuple2<String, Shows>> call(Tuple2<String, Shows[]> t) throws Exception {
            ArrayList<Tuple2<String, Shows>> ret = new ArrayList<>();
            for (Shows s : t._2())
                ret.add(new Tuple2<>(t._1(), s));
            return ret;
        }
    });

Hope it helped. Cheers!

like image 166
ale64bit Avatar answered Mar 19 '23 12:03

ale64bit