I am trying to implement modified version of Longest Common Substring where i do not want to split the words.
eg:
String s1 = "hi there how are you";
String s2 = "hi there how ar";
So the output should be:
o/p: hi there how.
I am first calculating the Longest common Substring and then filtering any words that are split and below is the code for the same:
private static void longestCommonSubstring(String S1, String S2) {
int Start = 0;
int Max = 0;
for (int i = 0; i < S1.length(); i++) {
for (int j = 0; j < S2.length(); j++) {
int x = 0;
while (S1.charAt(i + x) == S2.charAt(j + x)) {
x++;
if (((i + x) >= S1.length()) || ((j + x) >= S2.length()))
break;
}
if (x > Max) {
Max = x;
Start = i;
}
}
}
System.out.println("S1 " + S1 + " S2 " + S2 + " " + Start + " " + Max);
System.out.println("ans " + S1.substring(Start, (Start+Max)));
if(Start != 0){
if((S1.charAt(Start-1) == ' ')){
if(Start+Max != S1.length()){
if((S1.charAt(Start+Max) == ' ')){
System.out.println(S1.substring(Start, (Start + Max)));
}
}
}
else if((S1.charAt(Start+Max) == ' ')){
int index = S1.indexOf(' ', Start);
System.out.println(S1.substring(index+1, (Start + Max)));
}
else{
int index = S1.indexOf(' ', Start);
int lastIndex = S1.lastIndexOf(' ', (Start+Max));
if(index+1 != lastIndex+1){
System.out.println(S1.substring(index+1, lastIndex));
}
else{
System.out.println(" ");
}
}
}
else if(Start == 0 && Start+Max != S1.length() && (S1.charAt(Start+Max) == ' ')){
System.out.println(S1.substring(Start, (Start + Max)));
}
else if(Start+Max != S1.length()){
int index = S1.lastIndexOf(' ', (Start+Max));
System.out.println(S1.substring(Start, index));
}
else{
System.out.println(S1.substring(Start, (Start + Max)));
}
}
But it is failing for the cases like:
String s1 = "hi there how are you";
String s2 = "i ere ho ar you";
where the output should have been "you" but is blank, because the longest common Substring is "ere ho".
Thanks for the help.
Based on karthik, and after bna's comment that any character-based answer was flawed:
public static ArrayList<String> stringToTokens(String s) {
ArrayList<String> al = new ArrayList<String>();
StringBuilder word = new StringBuilder();
boolean inWord = !s.isEmpty() && Character.isLetter(s.charAt(0));
for (int i=0; i<s.length(); i++) {
char c = s.charAt(i);
if (Character.isLetter(c)) {
word.append(c);
} else if (inWord) {
if (word.length() > 0) {
al.add(word.toString());
word.setLength(0);
}
al.add(""+c);
} else {
al.add(""+c);
}
}
if (word.length() > 0) {
al.add(word.toString());
}
return al;
}
public static String longestCommonWithWords(String sa, String sb) {
ArrayList<String> a = stringToTokens(sa);
ArrayList<String> b = stringToTokens(sb);
int m[][] = new int[a.size()][b.size()];
int bestLength = 0;
HashSet<Integer> bestSet = new HashSet<Integer>();
for (int i=0; i<a.size(); i++) {
for (int j=0; j<b.size(); j++) {
if (a.get(i).equals(b.get(j))) {
m[i][j] = (i==0 || j==0) ? 1 : m[i-1][j-1]+1;
if (m[i][j] > bestLength) {
bestLength = m[i][j];
bestSet.clear();
bestSet.add(i);
} else if (m[i][j] == bestLength) {
bestSet.add(i);
}
} else {
m[i][j] = 0;
}
}
}
// return first result (or empty string if none)
StringBuilder result = new StringBuilder();
if (bestLength > 0) {
int end = bestSet.iterator().next();
int start = end - bestLength;
for (int i=start; i<end; i++) {
result.append(a.get(i+1));
}
}
return result.toString();
}
Tokenization is straightforward (a StringTokenizer would have worked too, but this version handles strange characters better). The LCS algorithm comes straight from the pseudocode in https://en.wikipedia.org/wiki/Longest_common_substring_problem#Pseudocode
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With