I have a list of items(i.e Strings) which I need to sort/filter.
The end result should not contain any duplicate (easy), I'll put them all in the Set. So I have a Set of strings now.
more explanation..
I also have a method x that calculates the amount of difference between two Strings (using levenstein distance).
Question:
Before inserting new String string
into my Set set
I want to check for levenstein distance using method x
between string
and any other String in the set
and if x
returns >=3
than I should not add it.
What is my best shot of doing this? Except iterate trough set
for each string
to be inserted?
Iterating through the Set
is going to be your best bet, since there isn't any built-in Set
implementation that would help you narrow the possibilities.
I have played with my idea of how to do it. I cannot think of a way to do this without any amount of iteration.
Supposing you have method named distance(String,String):int
that returns the given distance between two Strings.
String x = "Obi-wan"; //this is the item subject to eval addition
List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
if (items.filter(s -> distance(s, x) >= 3).getFirst() == null) {
items.add(x);
}
If you use JDK8 Preview you can do this in no time using exactly the code above. The Iterables.getFirst() method would not iterate the whole collection, but only until the first element satisfying the criteria is found.
Otherwise you will probably have to implement a Predicate interface and filtering method.
interface Predicate<T> {
public boolean eval(T o);
}
public static void main(String[] args) {
final String x = "Obi-wan"; //this is the item subject to eval addition
List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
Predicate<String> p = new Predicate<String>() {
public boolean eval(String s){
return distance(s, x) >= 3;
}
};
if(filter(items, p).isEmpty()){
items.add(x);
}
}
public static <T> List<T> filter(List<? extends T> items, Predicate<? super T> predicate){
List<T> destiny = new ArrayList<T>();
for(T item : items){
if(predicate.eval(item){
destiny.add(item);
}
}
return destiny;
}
Alternatively, you could stop filtering once you find the first item that satisfies your criteria.
You can use a custom comparator when creating the set. In your comparator you return that two strings are the same if they are the same (as per regular string comparison rules) or if their Levenstein distance fulfills your criteria.
When your comaprator says two strings are the same, the new string is not inserted into the set. (Note that this means the end result of the string might depend on the order of insertion)
Update: Addressing comments about total ordering:
Using a comparator like the one suggested above would make the endresult dependent on the order of insertion (as noted above), as would any other solution as the Levenstein distance criteria used does not define total ordering.
OTOH, once a string passes the not-equal test and is inserted into the set, no other string in the set will compare equal to this one, so the strings in the set will use their natural string ordering, which does define total ordering, so no further inconsistencies arise within the set's internal operations (e.g. sorting).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With