Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match the Nth word of a line containing a specific word using regex

Tags:

regex

I'm trying to do to get the correct regular expression to match the Nth word of a line containing a specific word.

For example, if I have this input:

this is the first line - blue
this is the second line - green
this is the third line - red

I want to match the seventh word of the lines containing the word "second" and return green.

I'm using Rubular to test the regular expression.

I already tried out this regular expression without success - it is matching the next line:

(.*second.*)(?<data>.*?\s){7}(.*)

Another example input:

this is the Foo line - blue
this is the Bar line - green
this is the Test line - red

I want to match the fourth word of the lines containing the word "red" and return Test.

The word I want to match can come either before or after the word I use to select the line.

like image 625
Jorge Avatar asked Jan 31 '14 16:01

Jorge


People also ask

How do I match a word in a regular expression?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

How do you search for a RegEx pattern anywhere within a string?

Pass the string you want to search into the Regex object's search() method. This returns a Match object. Call the Match object's group() method to return a string of the actual matched text.

What does \b represent in RegEx?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).


2 Answers

You can use this to match a line containing second and grab the 7th word:

^(?=.*\bsecond\b)(?:\S+ ){6}(\S+)

Make sure that the global and multiline flags are active.

^ matches the beginning of a line.

(?=.*\bsecond\b) is a positive lookahead to make sure there's the word second in that particular line.

(?:\S+ ){6} matches 6 words.

(\S+) will get the 7th.

regex101 demo


You can apply the same principle with other requirements.

With a line containing red and getting the 4th word...

^(?=.*\bred\b)(?:\S+ ){3}(\S+)
like image 107
Jerry Avatar answered Oct 21 '22 13:10

Jerry


You asked for regex, and you got a very good answer.

Sometimes you need to ask for the solution, and not specify the tool.

Here is the one-liner that I think best suits your need:

awk '/second/ {print $7}' < inputFile.txt

Explanation:

/second/     - for any line that matches this regex (in this case, literal 'second')
print $7     - print the 7th field (by default, fields are separated by space)

I think it is much easier to understand than the regex - and it's more flexible for this kind of processing.

like image 20
Floris Avatar answered Oct 21 '22 11:10

Floris