I have Dataset<Tuple2<String,DeviceData>>
and want to transform it to Iterator<DeviceData>
.
Below is my code where I am using collectAsList()
method and then getting Iterator<DeviceData>
.
Dataset<Tuple2<String,DeviceData>> ds = ...;
List<Tuple2<String, DeviceData>> listTuple = ds.collectAsList();
ArrayList<DeviceData> myDataList = new ArrayList<DeviceData>();
for(Tuple2<String, DeviceData> tuple : listTuple){
myDataList.add(tuple._2());
}
Iterator<DeviceData> myitr = myDataList.iterator();
I cannot use collectAsList()
as my data is huge and it will hamper performance. I looked into Dataset API but couldn't get any solution. I googled it but couldn't find any answer. Can someone please guide me? If the solution is in java that will be great. Thanks.
EDIT :
DeviceData
class is simple javabean. Here is printSchema() output for ds.
root
|-- value: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- deviceData: string (nullable = true)
| |-- deviceId: string (nullable = true)
| |-- sNo: integer (nullable = true)
You can directly extract DeviceData
from ds
instead of collecting and building again.
Java:
Function<Tuple2<String, DeviceData>, DeviceData> mapDeviceData =
new Function<Tuple2<String, DeviceData>, DeviceData>() {
public DeviceData call(Tuple2<String, DeviceData> tuple) {
return tuple._2();
}
};
Dataset<DeviceData> ddDS = ds.map(mapDeviceData) //extracts DeviceData from each record
Scala:
val ddDS = ds.map(_._2) //ds.map(row => row._2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With