Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter out items from Set in Java

Tags:

java

I have a list of items(i.e Strings) which I need to sort/filter.

The end result should not contain any duplicate (easy), I'll put them all in the Set. So I have a Set of strings now.

more explanation..

I also have a method x that calculates the amount of difference between two Strings (using levenstein distance).

Question:

Before inserting new String string into my Set set I want to check for levenstein distance using method x between string and any other String in the set and if x returns >=3 than I should not add it.

What is my best shot of doing this? Except iterate trough set for each string to be inserted?

like image 701
Gandalf StormCrow Avatar asked May 23 '12 16:05

Gandalf StormCrow


3 Answers

Iterating through the Set is going to be your best bet, since there isn't any built-in Set implementation that would help you narrow the possibilities.

like image 184
Louis Wasserman Avatar answered Oct 17 '22 03:10

Louis Wasserman


I have played with my idea of how to do it. I cannot think of a way to do this without any amount of iteration.

Supposing you have method named distance(String,String):int that returns the given distance between two Strings.

String x = "Obi-wan"; //this is the item subject to eval addition
List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
if (items.filter(s -> distance(s, x) >= 3).getFirst() == null) {
  items.add(x);
}

If you use JDK8 Preview you can do this in no time using exactly the code above. The Iterables.getFirst() method would not iterate the whole collection, but only until the first element satisfying the criteria is found.

Otherwise you will probably have to implement a Predicate interface and filtering method.

interface Predicate<T> {
    public boolean eval(T o);
}

public static void main(String[] args) {
   final String x = "Obi-wan"; //this is the item subject to eval addition
   List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
   Predicate<String> p = new Predicate<String>() {
       public boolean eval(String s){ 
           return distance(s, x) >= 3;
       }
   };
   if(filter(items, p).isEmpty()){ 
        items.add(x);
   }
}

public static <T> List<T> filter(List<? extends T> items, Predicate<? super T> predicate){
    List<T> destiny = new ArrayList<T>();
    for(T item : items){
       if(predicate.eval(item){
           destiny.add(item);
       }
    }
    return destiny;
}

Alternatively, you could stop filtering once you find the first item that satisfies your criteria.

like image 43
Edwin Dalorzo Avatar answered Oct 17 '22 03:10

Edwin Dalorzo


You can use a custom comparator when creating the set. In your comparator you return that two strings are the same if they are the same (as per regular string comparison rules) or if their Levenstein distance fulfills your criteria.

When your comaprator says two strings are the same, the new string is not inserted into the set. (Note that this means the end result of the string might depend on the order of insertion)

Update: Addressing comments about total ordering:

Using a comparator like the one suggested above would make the endresult dependent on the order of insertion (as noted above), as would any other solution as the Levenstein distance criteria used does not define total ordering.

OTOH, once a string passes the not-equal test and is inserted into the set, no other string in the set will compare equal to this one, so the strings in the set will use their natural string ordering, which does define total ordering, so no further inconsistencies arise within the set's internal operations (e.g. sorting).

like image 1
3 revs Avatar answered Oct 17 '22 04:10

3 revs