How to get the positions of all matches in a String?

Tags:

I have a text document and a query (the query could be more than one word). I want to find the position of all occurrences of the query in the document.

I thought of the documentText.indexOf(query) or using regular expression but I could not make it work.

I end up with the following method:

First, I have create a dataType called QueryOccurrence

public class QueryOccurrence implements Serializable{
  public QueryOccurrence(){}
  private int start;
  private int end;      

  public QueryOccurrence(int nameStart,int nameEnd,String nameText){
    start=nameStart;
    end=nameEnd;        
  }

  public int getStart(){
    return start;
  }

  public int getEnd(){
    return end;
  }

  public void SetStart(int i){
    start=i;
  }

  public void SetEnd(int i){
     end=i;
  }
}

Then, I have used this datatype in the following method:

    public static List<QueryOccurrence>FindQueryPositions(String documentText, String query){

    // Normalize do the following: lower case, trim, and remove punctuation
    String normalizedQuery = Normalize.Normalize(query);
    String normalizedDocument = Normalize.Normalize(documentText);

    String[] documentWords = normalizedDocument.split(" ");;               
    String[] queryArray = normalizedQuery.split(" ");


    List<QueryOccurrence> foundQueries = new ArrayList();
    QueryOccurrence foundQuery = new QueryOccurrence();

    int index = 0;

    for (String word : documentWords) {            

        if (word.equals(queryArray[0])){
            foundQuery.SetStart(index);
        }

        if (word.equals(queryArray[queryArray.length-1])){
            foundQuery.SetEnd(index);
            if((foundQuery.End()-foundQuery.Start())+1==queryArray.length){

                //add the found query to the list
                foundQueries.add(foundQuery);
                //flush the foundQuery variable to use it again
                foundQuery= new QueryOccurrence();
            }
        }

        index++;
    }
    return foundQueries;
}

This method return a list of all occurrence of the query in the document each one with its position.

Could you suggest any easer and faster way to accomplish this task.

Thanks

637

asked Nov 10 '12 22:11

user692704

1 Answers

Your first approach was a good idea, but String.indexOf does not support regular expressions.

Another easier way which uses a similar approach, but in a two step method, is as follows:

List<Integer> positions = new ArrayList();
Pattern p = Pattern.compile(queryPattern);  // insert your pattern here
Matcher m = p.matcher(documentText);
while (m.find()) {
   positions.add(m.start());
}

Where positions will hold all the start positions of the matches.

166

answered Sep 29 '22 11:09

Tim

Related questions
                            
                                name of main thread
                            
                                Dijkstra algorithm alternatives - shortest path in graph, bus routes
                            
                                Why is my if statement behaving this way?
                            
                                Hibernate: getting too many rows
                            
                                Why does JSplitPane add a border to my components, and how do I stop it?
                            
                                EntityManager query by joinColumn
                            
                                TransactionRolledbackLocalException Client's transaction aborted when accessing @Singleton
                            
                                How to stop a Java thread? [duplicate]
                            
                                Sort Sets in ArrayList by Size
                            
                                Storing Data in a Variable vs Inline Arithmetic
                            
                                while true with delay
                            
                                Reading a file vs loading a file into main memory from disk for processing
                            
                                Will the hashcode of a String will be the same for the Entire Application?
                            
                                Java core: returning null as boolean
                            
                                java.sql.Timestamp created from java.util.Date, why always before() it?
                            
                                Java putting set into map
                            
                                Using java to get the current process owner
                            
                                How to return JSONObject from doInBackground() method to onPostExecute() method on AsyncTask?
                            
                                What is an appropriate architecture for creating a multi-user client-server application?
                            
                                Couple of questions on ArrayList

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the positions of all matches in a String?

Tags:

java

string

match

user692704

People also ask

1 Answers

Tim

Recent Activity

Donate For Us