Here is my code:
// Import io so we can use file objects
import java.io.*;
public class SearchThe {
public static void main(String args[]) {
try {
String stringSearch = "the";
// Open the file c:\test.txt as a buffered reader
BufferedReader bf = new BufferedReader(new FileReader("test.txt"));
// Start a line count and declare a string to hold our current line.
int linecount = 0;
String line;
// Let the user know what we are searching for
System.out.println("Searching for " + stringSearch + " in file...");
// Loop through each line, stashing the line into our line variable.
while (( line = bf.readLine()) != null){
// Increment the count and find the index of the word
linecount++;
int indexfound = line.indexOf(stringSearch);
// If greater than -1, means we found the word
if (indexfound > -1) {
System.out.println("Word was found at position " + indexfound + " on line " + linecount);
}
}
// Close the file after done searching
bf.close();
}
catch (IOException e) {
System.out.println("IO Error Occurred: " + e.toString());
}
}
}
I want to find some word "the" in test.txt file. The problem is when I found the first "the", my program stops finding more.
And when some word like "then" my program understand it as the word "the".
Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".
indexOf("the")
can not discern between "the" and "then" since each starts with "the". Likewise, "the" is found in the middle of "anathema".
To avoid this, use regexes, and search for "the", with word boundaries (\b
) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ")
(spaces on either side) which would not find "the." and other instances next to punctuation. You can also do your search case insensitively to find "The" as well.
Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE);
while ( (line = bf.readLine()) != null) {
linecount++;
Matcher m = p.matcher(line);
// indicate all matches on the line
while (m.find()) {
System.out.println("Word was found at position " +
m.start() + " on line " + linecount);
}
}
You shouldn't use indexOf because it will find all the possible substring that you have in your string. And because "then" contains the string "the", so it is also a good substring.
More about indexOf
indexOf
public int indexOf(String str, int fromIndex) Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:
You should separate the lines into many words and loop over each word and compare to "the".
String [] words = line.split(" ");
for (String word : words) {
if (word.equals("the")) {
System.out.println("Found the word");
}
}
The above code snippet will also loop over all possible "the" in the line for you. Using indexOf will always returns you the first occurrence
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With