Lets say I have this list of words:
String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};
Than I have text
String text = "I would like to do a nice novel about nature AND people"
Is there method that matches the stopWords and removes them while ignoring case; like this somewhere out there?:
String noStopWordsText = remove(text, stopWords);
Result:
" would like do nice novel nature people"
If you know about regex that wold work great but I would really prefer something like commons solution that is bit more performance oriented.
BTW, right now I'm using this commons method which is lacking proper insensitive case handling:
private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};
noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);
To remove a substring from a string, call the replace() method, passing it the substring and an empty string as parameters, e.g. str. replace("example", "") . The replace() method will return a new string, where the first occurrence of the supplied substring is removed. Copied!
Java Program to Remove a Substring from a String The key method here is replace(). This can be called on a string to replace the first parameter with the second parameter. When the second parameter is a blank string, it effectively deletes the substring from the main string.
Java List remove() method is used to remove elements from the list.
Create a regular expression with your stop words, make it case insensitive, and then use the matcher's replaceAll
method to replace all matches with an empty string
import java.util.regex.*;
Pattern stopWords = Pattern.compile("\\b(?:i|a|and|about|an|are|...)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher matcher = stopWords.matcher("I would like to do a nice novel about nature AND people");
String clean = matcher.replaceAll("");
the ...
in the pattern is just me being lazy, continue the list of stop words.
Another method is to loop over all the stop words and use String
's replaceAll
method. The problem with that approach is that replaceAll
will compile a new regular expression for each call, so it's not very efficient to use in loops. Also, you can't pass the flag that makes the regular expression case insensitive when you use String
's replaceAll
.
Edit: I added \b
around the pattern to make it match whole words only. I also added \s*
to make it glob up any spaces after, that's maybe not necessary.
You can make a reg expression to match all the stop words [for example a
, note space here]and end up with
str.replaceAll(regexpression,"");
OR
String[] stopWords = new String[]{" i ", " a ", " and ", " about ", " an ", " are ", " as ", " at ", " be ", " by ", " com ", " for ", " from ", " how ", " in ", " is ", " it ", " not ", " of ", " on ", " or ", " that ", " the ", " this ", " to ", " was ", " what ", " when ", " where ", " who ", " will ", " with ", " the ", " www "};
String text = " I would like to do a nice novel about nature AND people ";
for (String stopword : stopWords) {
text = text.replaceAll("(?i)"+stopword, " ");
}
System.out.println(text);
output:
would like do nice novel nature people
There might be better way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With