Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Matcher.find() works [duplicate]

I am testing a small stub of Matcher and Pattern class...see the following small stub..

package scjp2.escape.sequence.examples;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Sample_19 {

    public static void main(String a[]){
        String stream = "ab34ef";
        Pattern pattern = Pattern.compile("\\d*");

        //HERE * IS GREEDY QUANTIFIER THAT LOOKS FOR ZERO TO MANY COMBINATION THAT 
        //START WITH NUMBER 
        Matcher matcher = pattern.matcher(stream);

        while(matcher.find()){
            System.out.print(matcher.start()+matcher.group());
        }
    }

}

Here ...our string which we are comparing is "ab34ef". which is of length 6.

Noe let see the iteration...


Iteration NO matcher.start() matcher.group()

1 0 ""

2 1 ""

3 2 34

4 4 ""

5 5 ""

Now ..let combine...matcher.start() + matcher.group().... the output as per our calculation is : 0123445

But, the stub generates 01234456.

I am not able to understand from where the "6" is coming. String index starts from zero and so here there can be maximum index is 5.So from where 6 is coming??

It iterates over the loop six times..How ? Any suggestion ?

like image 649
Gunjan Shah Avatar asked Jun 23 '12 17:06

Gunjan Shah


2 Answers

Your regular expression can match zero characters. The final match is a zero width string occurring at the end of the string, after the character at index 5. The index of this zero width string is therefore 6.


As an aside, you might also find it easier to understand what is going on if you use separators to make the output more readable:

System.out.println(matcher.start()+ ": " + matcher.group());

Results:

0: 
1: 
2: 34
4: 
5: 
6: 

ideone

like image 168
Mark Byers Avatar answered Oct 18 '22 20:10

Mark Byers


Your expression use * that means 0 or more digit, so can match no digit too.

Change your regular expression in this way

Pattern pattern = Pattern.compile("\\d+");

Using + means 1 or more.

like image 20
dash1e Avatar answered Oct 18 '22 21:10

dash1e