Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a dataset in Apache Flink?

Tags:

apache-flink

I have a Tuple Dataset of the form DataSet>. I wish to sort the "entire" Dataset on field String and then get only the Long values in a file. Flink does provide sort-partition but that does not help here as I need to sort the Dataset completely.

like image 833
Sagar Avatar asked Apr 01 '17 11:04

Sagar


1 Answers

You can also use sortPartition() to sort the complete DataSet if you set the parallelism to 1:

DataSet<Tuple2<String, Long>> data = ...
DataSet<Tuple2<String, Long>> sorted = data
  .sortPartition(0, Order.ASCENDING).setParallelism(1); // sort in one partition
DataSet<Long> longs = sorted.map(new LongExtractor());  // map to extract long
like image 176
Fabian Hueske Avatar answered Dec 17 '22 08:12

Fabian Hueske