Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: How to filter a big collection of objects with a big collection of predicates?

In Java, I have a big collection of objects (~ 10,000 objects), say Set<Person> cityInhabitants. I also have a big collection of predicates (~ 1,000 predicates) which shall be used to filter away any Person matching any of these predicates. Predicates could be for example

  • person.getName().equals("ugly name1")
  • person.getName().equals("ugly name2")
  • person.getAge() < 18.

This requirement calls for the following challenges:

  • the filtering shall be fast
  • the predicates are "business defined" and therefore it shall be easy to add and remove predicates. That means the predicates probably shouldn't be hard-coded in source code, but better be maintained in a database (?)

What are solutions to these challenges? Are there any libraries that can help here?

like image 943
Abdull Avatar asked Jan 28 '13 23:01

Abdull


2 Answers

In this situation there is not much you can do in relation to the complexity of the operation itself. If entries are many, predicates are many and predicates are expensive then you can optimize to be fast as you can but you won't surely get over a certain threshold because the single operation here maybe expensive.

You should test different approaches and see whatever performs better in your specific situation:

  • sort predicates according by first checking the ones that should be wider (so that the first predicates will filter out as many entries as possible)
  • sort predicates according to their complexity (so that faster will be executed first and the slower on less entries)
  • don't update the original data structure but keep a parallel set that will contain the filtered elements vs
  • always update the data structure so that you will loop over smaller amount of people everytime
like image 57
Jack Avatar answered Oct 18 '22 21:10

Jack


I would suggest you sort the predicates in order of speed of execution. You can then execute your predicates in order of speed, using the fastest ones first, generally meaning the slower predicates will have to run over a smaller set.

However this assumption is not completely correct, you would need to work out the percentage of predicates removed to speed of execution. Then we can see which is the fastest predicate that removes the highest percentage of objects. We can then execute the predicates in this order to me most optimal.

You can easily implement your own predicate interface

public interface Predicate<T> {

    boolean filter(T object);

}

You would then need to create predicate object for each of the rules. You can make some more dynamic classes for age and name checking which will reduce the amount of code you will need also.

public class AgeCheck<T> implements Predicate<T> {

    private final int min;
    private final int max;
    public AgeCheck(int min, int max) {
        this.min = min;
        this.max = max;
    }

    @Override
    public boolean filter(T object) {
        // if( t.age() < max && t.age > min) ...
    }

}
like image 22
Tom Cammann Avatar answered Oct 18 '22 21:10

Tom Cammann