I want to grep with patterns from file containing regex. When the pattern matches, it prints the matched stringa but not the pattern. How can I get the pattern instead matched strings?
pattern.txt
Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate
Donut Gorilla Chocolate
Chocolate (English|Fall) apple gorilla
gorilla chocolate (apple|ball)
(ball|donut) apple
strings.txt
apple ball Donut
donut ball chocolate
donut Ball Chocolate
apple donut
chocolate ball Apple
This is grep command
grep -Eix -f pattern.txt strings.txt
This command prints matched strings from strings.txt
apple ball Donut
donut ball chocolate
donut Ball Chocolate
But I want to find which patterns were used to match from pattern.txt
Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate
The pattern.txt can be lower cases, upper cases, line with regex and without, free numbers of words and regex elements. There is no other kind of regex than brackets and pipe.
I don't want to use loop to read pattern.txt each line to grep as it's slow. Is there way to print which pattern or line number of pattern file in grep command? or any other command than grep can do the job not too slow?
Displaying only the matched pattern : By default, grep displays the entire line which has the matched string. We can make the grep to display only the matched string by using the -o option. 6. Show line number while displaying the output using grep -n : To show the line number of file with the line matched.
The grep command in unix by default prints the lines from a file that contain the specified pattern. We can use the same grep command to display the lines that do not contain the specified pattern.
The grep utility searches the given input files selecting lines which match one or more patterns. The type of patterns is controlled by the options specified. By default, a pattern matches an input line if any regular expression (RE) in the pattern matches the input line without its trailing newline.
Indeed, grep returns 0 if it matches, and non-zero if it does not.
Using grep
I have no idea but with GNU awk:
$ awk '
BEGIN { IGNORECASE = 1 } # for case insensitivity
NR==FNR { # process pattern file
a[$0] # hash the entries to a
next # process next line
}
{ # process strings file
for(i in a) # loop all pattern file entries
if($0 ~ "^" i "$") { # if there is a match (see comments)
print i # output the matching pattern file entry
# delete a[i] # uncomment to delete matched patterns from a
# next # uncomment to end searching after first match
}
}' pattern strings
outputs:
D (A|B) C
For each line in strings
script will loop every pattern
line to see if there are more than one match. There is only one match due to case-sensitivity. You can battle that, for example, using GNU awk's IGNORECASE
.
Also, if you want each matched one pattern file entry to be outputed once, you could delete them from a
after first match: add delete a[i]
after the print
. That might give you some performance advantage also.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With