Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Odd issue with Java regex Kleene star

Tags:

java

regex

I was trying to answer a regex question for someone and I came across something that made me scratch my head. Giving the following code...

public static void main(String[] args) throws IOException {
        String test = "Hello, how are you today?";
        Pattern p = Pattern.compile("(\\W)+");
        String[] words = p.split(test);
        System.out.println("--" + words[0] + "--");
        System.out.println("--" + words[1] + "--");
    }

I get the expected results of

--Hello--
--how--

However when I use ...

public static void main(String[] args) throws IOException {
        String test = "Hello, how are you today?";
        Pattern p = Pattern.compile("(\\W)*");
        String[] words = p.split(test);
        System.out.println("--" + words[0] + "--");
        System.out.println("--" + words[1] + "--");
    }

I get the results

----
--H--

Is there a reason * doesn't work exactly like the + in this situation?

like image 746
Shaded Avatar asked Apr 27 '26 17:04

Shaded


2 Answers

* matches zero or more. As a result, everything becomes a delimiter (zero width delimiters)

Edit

By the way, that doesn't mean it's acting non-greedily. If you look at the characters returned you get this:

[, H, e, l, l, o, , h, o, w, , a, r, e, , y, o, u, , t, o, d, a, y]

Notice how there are not two empty elements between "o" and "h"; just one. Below, each delimiter is surrounded by {}.

{}H{}e{}l{}l{}o{, }{}h{}o{}w{ }{}a{}r{}e{ }{}y{}o{}u{ }{}t{}o{}d{}a{}y{?}
like image 128
Mark Peters Avatar answered Apr 29 '26 06:04

Mark Peters


Because + means one or more occurrences of the previous match whereas * means zero or more occurrences.

like image 44
maerics Avatar answered Apr 29 '26 08:04

maerics



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!