So, a long story short, I have a Java homework assignment that requires a long ArrayList of Strings to be manipulated in various ways (we're doing things like showing combinations of words, adding and removing from the ArrayList, nothing too special). I noticed that a few of the provided ArrayLists have duplicate entries (and the duplicates aren't necessary for this assignment), so I got the okay from my teacher to sanitize the data by removing duplicate entries. Here's what I came up with:
private static ArrayList<String> KillDups(ArrayList<String> ListOfStrings) {  
    for (int i = 0 ; i < ListOfStrings.size(); i++) {
        for (int j = i + 1; j < ListOfStrings.size(); j++) {
            //don't start on the same word or you'll eliminate it.
            if ( ListOfStrings.get(i).toString().equalsIgnoreCase( ListOfStrings.get(j).toString() )  ) {
                ListOfStrings.remove(j);//if they are the same, DITCH ONE.
                j = j -1; //removing the word basically changes the index, so swing down one.
            }                                
        }
    }
    return ListOfStrings;
}
This is fine for my assignment, but I doubt it would be very useful in the real world. Is there a way to do this that would ignore white space and special characters during the comparison? Is there a cleaner way in general to handle this (maybe without the nested For Loops)? Is there another question I should be asking that I don't know to ask?
Yes. And it can be done in just 1 (elegant) line:
List<String> noDups = new ArrayList<String>(new LinkedHashSet<String>(list));
The intermediate Set ensures no duplicates. The LinkedHashSet implementation of Set was chosen to preserve the order of the list.
Also, on a style note:
List) rather than the concrete (ie ArrayList) when specifying method signaturesYour whole method is then:
private static List<String> killDups(List<String> list) {
    return new ArrayList<String>(new LinkedHashSet<String>(list));
}
For extra brownie points make the method generic, so it works with any type of List:
private static <T> List<T> killDups(List<T> list) {
    return new ArrayList<T>(new LinkedHashSet<T>(list));
}
If you wanted to ignore certain characters, I'd create a class for that and have a list of those. Both the hashCode() and the equals() methods are relied upon by HashSets to remove dups:
public class MungedString {
    // simplified code
    String s;
    public boolean equals(Object o) {
        // implement how you want to compare them here
    }
    public int hashCode() {
        // keep this consistent with equals()
    }
}
then
List<MungedString> list;
List<MungedString> noDupList = killDups(list);
                        Consider using Set
For the most simple case, which is direct comparison of string, using Hashset is what you would want to do: 
Set<String> mySet = new HashSet<String>();
mySet.addAll(aListWithDuplciatedStrings);
then, what's inside mySet will be the unique set of strings.
For ignore-case comparison, it is the homework I left to you.  Look at TreeSet and Comparator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With