Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform Dataset<Tuple2<String,DeviceData>> to Iterator<DeviceData>

I have Dataset<Tuple2<String,DeviceData>> and want to transform it to Iterator<DeviceData>.

Below is my code where I am using collectAsList() method and then getting Iterator<DeviceData>.

Dataset<Tuple2<String,DeviceData>> ds = ...;
List<Tuple2<String, DeviceData>> listTuple = ds.collectAsList();

ArrayList<DeviceData> myDataList = new ArrayList<DeviceData>();
for(Tuple2<String, DeviceData> tuple : listTuple){
    myDataList.add(tuple._2());
}

Iterator<DeviceData> myitr = myDataList.iterator();

I cannot use collectAsList() as my data is huge and it will hamper performance. I looked into Dataset API but couldn't get any solution. I googled it but couldn't find any answer. Can someone please guide me? If the solution is in java that will be great. Thanks.

EDIT :

DeviceData class is simple javabean. Here is printSchema() output for ds.

root
 |-- value: string (nullable = true)
 |-- _2: struct (nullable = true)
 |    |-- deviceData: string (nullable = true)
 |    |-- deviceId: string (nullable = true)
 |    |-- sNo: integer (nullable = true)
like image 395
user7615505 Avatar asked Feb 25 '17 13:02

user7615505


1 Answers

You can directly extract DeviceData from ds instead of collecting and building again.

Java:

Function<Tuple2<String, DeviceData>, DeviceData> mapDeviceData =
    new Function<Tuple2<String, DeviceData>, DeviceData>() {
      public DeviceData call(Tuple2<String, DeviceData> tuple) {
        return tuple._2();
      }
    };

Dataset<DeviceData> ddDS = ds.map(mapDeviceData) //extracts DeviceData from each record

Scala:

val ddDS = ds.map(_._2) //ds.map(row => row._2)
like image 71
mrsrinivas Avatar answered Sep 23 '22 16:09

mrsrinivas