I have a project where, in different scenarios, I have to work on different subsets of a large dataset. The way I have written the code, there is a Collector
interface, and a class DataCollector implements Collector
. The class DataCollector
is instantiated with the condition of the subset-creation, and these conditions are enums.
Let's say the dataset is a set of 1 million English words, and I want to work on the subset of words consisting of odd number of letters. Then, I do the following:
DataCollector dataCollector = new DataCollector(CollectionType.WORDS_OF_ODD_LENGTH);
Set<String> fourLetteredWords = dataCollector.collect();
where CollectionType
is the enum class
enum CollectionType {
WORDS_OF_ODD_LENGTH,
WORDS_OF_EVEN_LENGTH,
STARTING_WITH_VOWEL,
STARTING_WITH_CONSONANT,
....
}
The data collector calls a java.util.Predicate
depending on the enum with which it was instantiated.
So far, this approach has been robust and flexible enough, but now I am facing increasingly complex scenarios (e.g., collect words of even length starting with a vowel). I would like to avoid adding new CollectionType
for every such scenario. What I have noticed is that many of these complex scenarios are just logical operations on the simpler ones (e.g., condition_1 && (condition_2 || condition_3)
).
The end-user is the one who specifies these conditions, and the only control I have is that I can specify the set of such conditions. As in, the end-user can only select from CollectionType
. Right now, I am trying to generalize from the ability to select only one condition to the ability to select one or more. For that, I need something like
DataCollector dataCollector = new DataCollector(WORDS_OF_ODD_LENGTH &&
STARTING_WITH_VOWEL);
Is there a way I model my enums to carry out such operations? I am open to other ideas (as in, should I just scrap this enum-based approach for something else, etc.).
I suggest you use Java 8 which has Predicate and operations supporting predicates.
enum CollectionType implements Predicate<String> {
WORDS_OF_ODD_LENGTH(s -> s.length() % 2 != 0),
WORDS_OF_EVEN_LENGTH(WORDS_OF_ODD_LENGTH.negate()),
STARTING_WITH_VOWEL(s -> isVowel(s.charAt(0))),
STARTING_WITH_CONSONANT(STARTING_WITH_VOWEL.negate()),
COMPLEX_CHECK(CollectionType::complexCheck);
private final Predicate<String> predicate;
CollectionType(Predicate<String> predicate) {
this.predicate = predicate;
}
static boolean isVowel(char c) {
return "AEIOUaeiou".indexOf(c) >= 0;
}
public boolean test(String s) {
return predicate.test(s);
}
public static boolean complexCheck(String s) {
// many lines of code, calling many methods
}
}
The you can write a Predicate like
Predicate<String> p = WORDS_OF_ODD_LENGTH.and(STARTING_WITH_CONSONANT);
or even five letter words starting with a vowel
Predicate<String> p = STARTING_WITH_VOWEL.and(s -> s.length() == 5);
Say you wanted to use this filter on reading the file, you can do
List<String> oddWords = Files.lines(path).filter(WORDS_OF_ODD_LENGTH).collect(toList());
Or you could index them as you load them with
Map<Integer, List<String>> wordsBySize = Files.lines(path)
.collect(groupBy(s -> s.length()));
Even though you have made your enum is a Predicate you can optimise its usage like this.
if (predicate == WORDS_OF_ODD_LENGTH || predicate == WORDS_OF_EVEN_LENGTH) {
// assume if the first word in a list of words of the same length
// then take all words of that length.
return wordsBySize.values().stream()
.filter(l -> predicate.test(l.get(0)))
.flatMap(l -> l.stream()).collect(toList());
} else {
return wordsBySize.values().stream()
.flatMap(l -> l.stream())
.filter(predicate)
.collect(toList());
}
i.e. by using enum
you can recognise some predicates and optimise for them. (Whether that is a good idea or not I will leave to you)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With