I want to match a string which may contain a type of character before the match, or the match may begin at the beginning of the string (same for end of string).
For a minimal example, consider the text n.b.
, which I'd like to match either at the beginning of a line and end of a line or between two non-word characters, or some combination. The easiest way to do this would be to use word boundaries (\bn\.b\.\b
), but that doesn't match; similar cases happen for other desired matches with non-word characters in them.
I'm currently using (^|[^\w])n\.b\.([^\w]|$)
, which works satisfactorily, but will also match the non-word characters (such as dashes) which appear immediately before and after the word, if available. I'm doing this in grep, so while I could easily pipe the output into sed, I'm using grep's --color
option, which is disabled when piping into another command (for obvious reasons).
EDIT: The \K
option (i.e. (\K^|[^\w])n\.b\.(\K[^\w]|$)
seems to work, but it also does discard the color on the match within the output. While I could, again, invoke auxiliary tools, I'd love it if there was a quick and simple solution.
EDIT: I have misunderstood the \K
operator; it simply removes all the text from the match preceding its use. No wonder it was failing to color the output.
If you're using grep, you must be using the -P
option, or lookarounds and \K
would throw errors. That means you also have negative lookarounds at your disposal. Here's a simpler version of your regex:
(?<!\w)n\.b\.(?!\w)
Also, be aware that (?<=...)
and (?<!...)
are lookbehinds, and (?=...)
and (?!...)
are lookaheads. The wording of your title suggests you may have gotten those mixed up, a common beginner's mistake.
Apparently matching beginning of string is possible inside lookahead/lookbehinds; the obvious solution is then (?<=^|[^\w])n\.b\.(?=[^\w]|$)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With