I'm trying be able to compare two Strings and identify duplicate words. For example;
String1 = "Hello, my name is John."
String2 = "Can you tell me your name please?"
Comparing String1 and String2 would return the word; "name".
I know it is possible to split these two strings into an array of words, and then iterate over each word of each String in a 2-D array. However this is computationally expensive at O(n^2) and I was wondering if there is a faster way of doing this?
Thanks.
EDIT: Changed the example for clarity.
To compare these strings in Java, we need to use the equals() method of the string. You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not.
One solution is to use Java compareTo() method. The method compareTo() is used for comparing two strings lexicographically in Java.
After getting the strings to word arrays:
You can add all the elements in the first array to a hashmap and then scan the second array to see if each of the elements exists in the hashmap. Since access time to a hashmap is O(1), this will be O(n+m) time complexity.
If you do not want to use extra space, you can sort both of the arrays in O(nlogn) and then compare the items in O(n+m) which would give you O(nlogn) in total.
One simple solution is to use the Sets.intersection
method of Guava's Sets
. It is pretty easy:
String s1 = "Hello, my name is John.";
String s2 = "Can you tell me your name?";
Splitter splitter = Splitter.onPattern("\\W").trimResults().omitEmptyStrings();
Set<String> intersection = Sets.intersection(//
Sets.newHashSet(splitter.split(s1)), //
Sets.newHashSet(splitter.split(s2)));
System.out.println(intersection);
Output:
[name]
You can also find more information on algorithms to detect Set intersection on this thread.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With