Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep regex lookahead or start of string (or lookbehind or end of string)

I want to match a string which may contain a type of character before the match, or the match may begin at the beginning of the string (same for end of string).

For a minimal example, consider the text n.b., which I'd like to match either at the beginning of a line and end of a line or between two non-word characters, or some combination. The easiest way to do this would be to use word boundaries (\bn\.b\.\b), but that doesn't match; similar cases happen for other desired matches with non-word characters in them.

I'm currently using (^|[^\w])n\.b\.([^\w]|$), which works satisfactorily, but will also match the non-word characters (such as dashes) which appear immediately before and after the word, if available. I'm doing this in grep, so while I could easily pipe the output into sed, I'm using grep's --color option, which is disabled when piping into another command (for obvious reasons).

EDIT: The \K option (i.e. (\K^|[^\w])n\.b\.(\K[^\w]|$) seems to work, but it also does discard the color on the match within the output. While I could, again, invoke auxiliary tools, I'd love it if there was a quick and simple solution.

EDIT: I have misunderstood the \K operator; it simply removes all the text from the match preceding its use. No wonder it was failing to color the output.

like image 602
cosmicexplorer Avatar asked Apr 28 '15 03:04

cosmicexplorer


2 Answers

If you're using grep, you must be using the -P option, or lookarounds and \K would throw errors. That means you also have negative lookarounds at your disposal. Here's a simpler version of your regex:

(?<!\w)n\.b\.(?!\w)

Also, be aware that (?<=...) and (?<!...) are lookbehinds, and (?=...) and (?!...) are lookaheads. The wording of your title suggests you may have gotten those mixed up, a common beginner's mistake.

like image 133
Alan Moore Avatar answered Oct 14 '22 02:10

Alan Moore


Apparently matching beginning of string is possible inside lookahead/lookbehinds; the obvious solution is then (?<=^|[^\w])n\.b\.(?=[^\w]|$).

like image 2
cosmicexplorer Avatar answered Oct 14 '22 04:10

cosmicexplorer