Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing strings from another string in java

Tags:

java

string

Lets say I have this list of words:

 String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};

Than I have text

 String text = "I would like to do a nice novel about nature AND people"

Is there method that matches the stopWords and removes them while ignoring case; like this somewhere out there?:

 String noStopWordsText = remove(text, stopWords);

Result:

 " would like do nice novel nature people"

If you know about regex that wold work great but I would really prefer something like commons solution that is bit more performance oriented.

BTW, right now I'm using this commons method which is lacking proper insensitive case handling:

 private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
 private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};

 noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);     
like image 763
MatBanik Avatar asked Jan 22 '11 17:01

MatBanik


People also ask

How do I delete one String from another?

To remove a substring from a string, call the replace() method, passing it the substring and an empty string as parameters, e.g. str. replace("example", "") . The replace() method will return a new string, where the first occurrence of the supplied substring is removed. Copied!

How do you subtract a String from a String in Java?

Java Program to Remove a Substring from a String The key method here is replace(). This can be called on a string to replace the first parameter with the second parameter. When the second parameter is a blank string, it effectively deletes the substring from the main string.

How do I remove a String from a list of strings in Java?

Java List remove() method is used to remove elements from the list.


2 Answers

Create a regular expression with your stop words, make it case insensitive, and then use the matcher's replaceAll method to replace all matches with an empty string

import java.util.regex.*;

Pattern stopWords = Pattern.compile("\\b(?:i|a|and|about|an|are|...)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher matcher = stopWords.matcher("I would like to do a nice novel about nature AND people");
String clean = matcher.replaceAll("");

the ... in the pattern is just me being lazy, continue the list of stop words.

Another method is to loop over all the stop words and use String's replaceAll method. The problem with that approach is that replaceAll will compile a new regular expression for each call, so it's not very efficient to use in loops. Also, you can't pass the flag that makes the regular expression case insensitive when you use String's replaceAll.

Edit: I added \b around the pattern to make it match whole words only. I also added \s* to make it glob up any spaces after, that's maybe not necessary.

like image 71
Theo Avatar answered Nov 02 '22 12:11

Theo


You can make a reg expression to match all the stop words [for example a , note space here]and end up with

str.replaceAll(regexpression,"");

OR

 String[] stopWords = new String[]{" i ", " a ", " and ", " about ", " an ", " are ", " as ", " at ", " be ", " by ", " com ", " for ", " from ", " how ", " in ", " is ", " it ", " not ", " of ", " on ", " or ", " that ", " the ", " this ", " to ", " was ", " what ", " when ", " where ", " who ", " will ", " with ", " the ", " www "};
        String text = " I would like to do a nice novel about nature AND people ";

        for (String stopword : stopWords) {
            text = text.replaceAll("(?i)"+stopword, " ");
        }
        System.out.println(text);

output:

 would like do nice novel nature people 
  • IdeOneDemo

There might be better way.

like image 28
jmj Avatar answered Nov 02 '22 14:11

jmj