I've been playing with this regex in Java for ages and can't get it to work:
(?:^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?: |$)
The following:
pattern.matcher("the cat in the hat").replaceAll(" ")
gives me cat the hat. Another example input is the cat in of the next hat which gives me cat of next hat.
Is there any way I can make this regex replacement work without having to break them out into multiple separate regexes for each word and try to replace a string repeatedly?
To replace all words with another String using Java Regular Expressions, we need to use the replaceAll() method. The replaceAll() method returns a String replacing all the character sequence matching the regular expression and String after replacement.
Using String.String. replace() is used to replace all occurrences of a specific character or substring in a given String object without using regex. There are two overloaded methods available in Java for replace() : String.
The replaceAll() method will substitute all instances of the string or regular expression pattern you specify, whereas the replace() method will replace only the first occurrence.
\\s+ --> replaces 1 or more spaces. \\\\s+ --> replaces the literal \ followed by s one or more times.
Yeah, you can do this pretty easily you just need to use boundaries, which is what you're trying to describe with: (?:^| ) Just do this instead:
\b(?:the|and|at|in|or|on|off|all|beside|under|over|next)\b
Your original didn't capture, but as is mentioned in the comments, if you want to capture the options you can use a capturing instead of a non-capturing group:
\b(the|and|at|in|or|on|off|all|beside|under|over|next)\b
The problem with yours is that the leading and trailing spaces are included in the matches, and a char cannot be found in two matches.
So with the input the_cat_in_the_hat (the underscores replace the spaces here, to make the explanation clearer):
the_, remaining string: cat_in_the_hat
_in_, remaining string: the_hat
the is not matched, since it is neither preceded by a space nor by the beginning of the (original) string.You could have used lookarounds instead, since they behave like conditions (i.e. if):
(?<=^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?= |$)

Debuggex Demo
This way, you would have:
the, remaining string: _cat_in_the_hat
in, remaining string: _the_hat
the, remaining string: _hat
But @JonathanMee answer is the best solution, since word boundaries were implemented precisly for this purpose ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With