I've been playing with this regex in Java for ages and can't get it to work:
(?:^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?: |$)
The following:
pattern.matcher("the cat in the hat").replaceAll(" ")
gives me cat the hat
. Another example input is the cat in of the next hat
which gives me cat of next hat
.
Is there any way I can make this regex replacement work without having to break them out into multiple separate regexes for each word and try to replace a string repeatedly?
To replace all words with another String using Java Regular Expressions, we need to use the replaceAll() method. The replaceAll() method returns a String replacing all the character sequence matching the regular expression and String after replacement.
Using String.String. replace() is used to replace all occurrences of a specific character or substring in a given String object without using regex. There are two overloaded methods available in Java for replace() : String.
The replaceAll() method will substitute all instances of the string or regular expression pattern you specify, whereas the replace() method will replace only the first occurrence.
\\s+ --> replaces 1 or more spaces. \\\\s+ --> replaces the literal \ followed by s one or more times.
Yeah, you can do this pretty easily you just need to use boundaries, which is what you're trying to describe with: (?:^| )
Just do this instead:
\b(?:the|and|at|in|or|on|off|all|beside|under|over|next)\b
Your original didn't capture, but as is mentioned in the comments, if you want to capture the options you can use a capturing instead of a non-capturing group:
\b(the|and|at|in|or|on|off|all|beside|under|over|next)\b
The problem with yours is that the leading and trailing spaces are included in the matches, and a char cannot be found in two matches.
So with the input the_cat_in_the_hat
(the underscores replace the spaces here, to make the explanation clearer):
the_
, remaining string: cat_in_the_hat
_in_
, remaining string: the_hat
the
is not matched, since it is neither preceded by a space nor by the beginning of the (original) string.You could have used lookarounds instead, since they behave like conditions (i.e. if
):
(?<=^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?= |$)
Debuggex Demo
This way, you would have:
the
, remaining string: _cat_in_the_hat
in
, remaining string: _the_hat
the
, remaining string: _hat
But @JonathanMee answer is the best solution, since word boundaries were implemented precisly for this purpose ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With