Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping Double using java 8 by arbitrary interval to Map

I have data which is represented as a list of positive double numbers, and a list containing the intervals which will be used to group the data. The interval is always sorted.
I tried to group the data with the following implementation

    List<Double> data = DoubleStream.generate(new Random()::nextDouble).limit(10).map(d -> new Random().nextInt(30) * d).boxed().collect(Collectors.toList());
    HashMap<Integer, List<Double>> groupped = new HashMap<Integer, List<Double>>();
    data.stream().forEach(d -> {
        groupped.merge(getGroup(d, group), new ArrayList<Double>(Arrays.asList(d)), (l1, l2) -> {
            l1.addAll(l2);
            return l1;
        });
    });
    public static Integer getGroup(double data, List<Integer> group) {

    for (int i = 1; i < group.size(); i++) {
        if (group.get(i) > data) {
            return group.get(i - 1);
        }
    }
    return group.get(group.size() - 1);
}
    public static List<Integer> group() {
       List<Integer> groups = new LinkedList<Integer>();
       //can be arbitrary groupping
       groups.add(0);
       groups.add(6);
       groups.add(11);
       groups.add(16);
       groups.add(21);
       groups.add(26);
       return groups;
   }

Is it possible to perform this kind of groupping/reducing by performing directly the logic on data, through collectors?

In addition, thinking about the complexity of the process, this should take n^2 since we iterate over two list (or stream). Now it's not in parallel but i think it's possible to perform getGroup() in paralel. Any Insight should TreeSet or List should be used for better performance?

like image 835
Eko Susilo Avatar asked Mar 12 '23 17:03

Eko Susilo


1 Answers

There are plenty of improvements you can apply on your code. Random supports Stream API. So no need to generate your own DoubleStream.
Next you should generate your boundary set only once.
Last thing there is a Collector::groupingBy which does the job for you.


import java.util.List;
import java.util.Map;
import java.util.NavigableSet;
import java.util.Random;
import java.util.TreeMap;
import java.util.TreeSet;
import java.util.stream.Collectors;

public class Test {

  public static void main(String... args) {
    Random r = new Random();
    List<Double> data = r.doubles(10).map(d -> r.nextInt(30) * d).peek(System.out::println).boxed()
        .collect(Collectors.toList());
    NavigableSet<Integer> groups = group();
    Map<Integer, List<Double>> groupped = data.stream()
        .collect(Collectors.groupingBy(d -> groups.floor(d.intValue()), TreeMap::new, Collectors.toList()));
    System.out.println(groupped);
  }

  public static NavigableSet<Integer> group() {
    NavigableSet<Integer> groups = new TreeSet<>();
    groups.add(0);
    groups.add(6);
    groups.add(11);
    groups.add(16);
    groups.add(21);
    groups.add(26);
    return groups;
  }
}
like image 57
Flown Avatar answered Mar 23 '23 22:03

Flown