Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ack regex: Matching two words in order in the same line

Tags:

regex

ack

I would like to find lines in files that include two words, word_1 and word_2 in order, such as in Line A below, but not as in Line B or Line C:

Line A: ... word_1 .... word_2 ....  Line B: ... word_1 .... Line C: ... word_2 .... 

I have tried

$ack '*word_1*word_2' $ack '(word_1)+*(word_2)+' 

and the same commands with ^ appended at the beginning of the regex (in an attempt to follow the Perl regex syntax).

None of these commands return the files or the lines I am interested in.

What am I doing wrong?

Thanks!

like image 268
Amelio Vazquez-Reina Avatar asked Apr 09 '11 20:04

Amelio Vazquez-Reina


People also ask

How do you find multiple words in a regular expression?

However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is multiline matching?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

What is Slash's regex?

The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d is a shorthand that matches a single digit from 0 to 9. Escaping a single metacharacter with a backslash works in all regular expression flavors.


1 Answers

You want to find word_1, followed by anything, any number of times, followed by word_2. That should be

word_1.*word_2 

You seem to be using * as it is often used in command line searches, but in regexes is it a quantifier for the preceding character, meaning match it at least 0 times. For example, the regex a* would match 0 or more as, whereas the regex a+ would match at least one a.

The regex metacharacter meaning "match anything" is ., so .* means "match anything, any number of times. See perlrequick for a brief introduction on the topic.

like image 51
dsolimano Avatar answered Oct 15 '22 06:10

dsolimano