Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the positions of all matches in a String?

Tags:

java

string

match

I have a text document and a query (the query could be more than one word). I want to find the position of all occurrences of the query in the document.

I thought of the documentText.indexOf(query) or using regular expression but I could not make it work.

I end up with the following method:

First, I have create a dataType called QueryOccurrence

public class QueryOccurrence implements Serializable{
  public QueryOccurrence(){}
  private int start;
  private int end;      

  public QueryOccurrence(int nameStart,int nameEnd,String nameText){
    start=nameStart;
    end=nameEnd;        
  }

  public int getStart(){
    return start;
  }

  public int getEnd(){
    return end;
  }

  public void SetStart(int i){
    start=i;
  }

  public void SetEnd(int i){
     end=i;
  }
}

Then, I have used this datatype in the following method:

    public static List<QueryOccurrence>FindQueryPositions(String documentText, String query){

    // Normalize do the following: lower case, trim, and remove punctuation
    String normalizedQuery = Normalize.Normalize(query);
    String normalizedDocument = Normalize.Normalize(documentText);

    String[] documentWords = normalizedDocument.split(" ");;               
    String[] queryArray = normalizedQuery.split(" ");


    List<QueryOccurrence> foundQueries = new ArrayList();
    QueryOccurrence foundQuery = new QueryOccurrence();

    int index = 0;

    for (String word : documentWords) {            

        if (word.equals(queryArray[0])){
            foundQuery.SetStart(index);
        }

        if (word.equals(queryArray[queryArray.length-1])){
            foundQuery.SetEnd(index);
            if((foundQuery.End()-foundQuery.Start())+1==queryArray.length){

                //add the found query to the list
                foundQueries.add(foundQuery);
                //flush the foundQuery variable to use it again
                foundQuery= new QueryOccurrence();
            }
        }

        index++;
    }
    return foundQueries;
}

This method return a list of all occurrence of the query in the document each one with its position.

Could you suggest any easer and faster way to accomplish this task.

Thanks

like image 637
user692704 Avatar asked Nov 10 '12 22:11

user692704


People also ask

How do you find the position of a string?

The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.

What is the use of SPAN () in regular expression?

group() returns the substring that was matched by the RE. start() and end() return the starting and ending index of the match. span() returns both start and end indexes in a single tuple. Since the match() method only checks if the RE matches at the start of a string, start() will always be zero.

How do you match a string to a pattern?

Put brackets ( [ ] ) in the pattern string, and inside the brackets put the lowest and highest characters in the range, separated by a hyphen ( – ). Any single character within the range makes a successful match.


1 Answers

Your first approach was a good idea, but String.indexOf does not support regular expressions.

Another easier way which uses a similar approach, but in a two step method, is as follows:

List<Integer> positions = new ArrayList();
Pattern p = Pattern.compile(queryPattern);  // insert your pattern here
Matcher m = p.matcher(documentText);
while (m.find()) {
   positions.add(m.start());
}

Where positions will hold all the start positions of the matches.

like image 166
Tim Avatar answered Sep 29 '22 11:09

Tim