I have a multiline document from which I'm looking to extract a particular keyword and the word after that. It looks like this:
This is key word1 line 1. This is line 2. This is key word2 line 3.
If I use egrep 'key [^s]+ '
, the output is:
This is key word1 line 1. This is key word2 line 2.
However, I'd like the output to be the match only as opposed to the whole line, that is:
key word1 key word2
Is there a way to do that?
If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
grep(1)
has a -o
flag that outputs only the matching part of the line. From the man page:
-o, --only-matching Show only the part of a matching line that matches PATTERN.
Your pattern isn't right to get the output you want, though. Try:
$ egrep -o 'key \w+' file key word1 key word2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With