I'm trying to do to get the correct regular expression to match the Nth word of a line containing a specific word.
For example, if I have this input:
this is the first line - blue
this is the second line - green
this is the third line - red
I want to match the seventh word of the lines containing the word "second" and return green
.
I'm using Rubular to test the regular expression.
I already tried out this regular expression without success - it is matching the next line:
(.*second.*)(?<data>.*?\s){7}(.*)
Another example input:
this is the Foo line - blue
this is the Bar line - green
this is the Test line - red
I want to match the fourth word of the lines containing the word "red" and return Test
.
The word I want to match can come either before or after the word I use to select the line.
To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.
Pass the string you want to search into the Regex object's search() method. This returns a Match object. Call the Match object's group() method to return a string of the actual matched text.
The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
You can use this to match a line containing second
and grab the 7th word:
^(?=.*\bsecond\b)(?:\S+ ){6}(\S+)
Make sure that the global and multiline flags are active.
^
matches the beginning of a line.
(?=.*\bsecond\b)
is a positive lookahead to make sure there's the word second
in that particular line.
(?:\S+ ){6}
matches 6 words.
(\S+)
will get the 7th.
regex101 demo
You can apply the same principle with other requirements.
With a line containing red
and getting the 4th word...
^(?=.*\bred\b)(?:\S+ ){3}(\S+)
You asked for regex, and you got a very good answer.
Sometimes you need to ask for the solution, and not specify the tool.
Here is the one-liner that I think best suits your need:
awk '/second/ {print $7}' < inputFile.txt
Explanation:
/second/ - for any line that matches this regex (in this case, literal 'second')
print $7 - print the 7th field (by default, fields are separated by space)
I think it is much easier to understand than the regex - and it's more flexible for this kind of processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With