I am attempting to return the first occurrence of multiple strings, ie, I want to select the lines from the following text where the first occurrence of 1259, 3009, and 1589 happen. <pre class="prettyprint"><code>ADWN 1259 11:00 B23 ADWN 3009 12:00 B19 DDWN 723 11:30 B04 ADWN 1589 14:20 B12 ADWN 1259 11:10 B23 DDWN 2534 13:00 B16 ADWN 3009 11:50 B14 </code></pre> This gives me all matches: <pre class="prettyprint"><code>grep '1259\|3009\|1589' somelog.log </code></pre> And this gives me only the first match <pre class="prettyprint"><code>grep -m 1 '1259\|3009\|1589' somelog.log </code></pre> I want to return the following: <pre class="prettyprint"><code>ADWN 1259 11:00 B23 ADWN 3009 12:00 B19 ADWN 1589 14:20 B12 </code></pre> I think that creating a file with the required values, and then looping through the file, passing each number individually into the grep command will give me what I am looking for, but I haven't found an example of this. Is there a simple solution for this, is a loop the best way to handle this, or has this example already been answered elsewhere? Thanks in advance for your ideas and suggestions-- Clyde

One way using <code>awk</code>: <pre class="prettyprint"><code>awk '!array[$2]++ && $2 ~ /^1259$|^3009$|^1589$/' file.txt </code></pre> Results: <pre class="prettyprint"><code>ADWN 1259 11:00 B23 ADWN 3009 12:00 B19 ADWN 1589 14:20 B12 </code></pre> edit: I should really get into the habit of reading the whole question first. I see that you're thinking of creating a file with the values you'd like to find the first occurrence of. Put these in a file called <code>values.txt</code> with one value per line. For example; here's the contents of <code>values.txt</code>: <pre class="prettyprint"><code>1259 3009 1589 </code></pre> Then run this: <pre class="prettyprint"><code>awk 'FNR==NR { array[$0]++; next } $2 in array { print; delete array[$2] }' values.txt file.txt </code></pre> Results: <pre class="prettyprint"><code>ADWN 1259 11:00 B23 ADWN 3009 12:00 B19 ADWN 1589 14:20 B12 </code></pre> <hr> 1st command explanation: If the second column (<code>$2</code>) equals one of those three values listed, add it to the array if it's not already in there. <code>awk</code> prints the whole line by default. 2nd command explanation: <code>FNR</code> is number of records relative to the current input file. <code>NR</code> is the total number of records. The <code>FNR==NR { ... }</code> construct is only true for the first input file. So for each of the lines in <code>values.txt</code>, we add the whole line (<code>$0</code>) to an array (I've called it array, but you could give it another name). <code>next</code> forces <code>awk</code> to read the next line in <code>values.txt</code> (and skip processing the rest of the command). When <code>FNR==NR</code> is no longer true, the second file in the arguments list is read. We then check for the second column (<code>$2</code>)in the array, if it's in there, print it and remove it from the array. By using <code>delete</code> we essentially set a max count of one.

Need to grep for first occurrences of multiple strings

Tags:

grep

I am attempting to return the first occurrence of multiple strings, ie, I want to select the lines from the following text where the first occurrence of 1259, 3009, and 1589 happen.

ADWN    1259    11:00   B23

ADWN    3009    12:00   B19

DDWN     723    11:30   B04

ADWN    1589    14:20   B12

ADWN    1259    11:10   B23

DDWN    2534    13:00   B16

ADWN    3009    11:50   B14

This gives me all matches:

grep '1259\|3009\|1589'  somelog.log

And this gives me only the first match

grep -m 1  '1259\|3009\|1589'  somelog.log

I want to return the following:

ADWN    1259    11:00   B23

ADWN    3009    12:00   B19

ADWN    1589    14:20   B12

I think that creating a file with the required values, and then looping through the file, passing each number individually into the grep command will give me what I am looking for, but I haven't found an example of this. Is there a simple solution for this, is a loop the best way to handle this, or has this example already been answered elsewhere?

Thanks in advance for your ideas and suggestions--

Clyde

950

asked Nov 03 '12 00:11

comuter geek

1 Answers

One way using awk:

awk '!array[$2]++ && $2 ~ /^1259$|^3009$|^1589$/' file.txt

Results:

ADWN    1259    11:00   B23
ADWN    3009    12:00   B19
ADWN    1589    14:20   B12

edit:

I should really get into the habit of reading the whole question first. I see that you're thinking of creating a file with the values you'd like to find the first occurrence of. Put these in a file called values.txt with one value per line. For example; here's the contents of values.txt:

1259
3009
1589

Then run this:

awk 'FNR==NR { array[$0]++; next } $2 in array { print; delete array[$2] }' values.txt file.txt

Results:

ADWN    1259    11:00   B23
ADWN    3009    12:00   B19
ADWN    1589    14:20   B12

1st command explanation:

If the second column ($2) equals one of those three values listed, add it to the array if it's not already in there. awk prints the whole line by default.

2nd command explanation:

FNR is number of records relative to the current input file.
NR is the total number of records.

The FNR==NR { ... } construct is only true for the first input file. So for each of the lines in values.txt, we add the whole line ($0) to an array (I've called it array, but you could give it another name). next forces awk to read the next line in values.txt (and skip processing the rest of the command). When FNR==NR is no longer true, the second file in the arguments list is read. We then check for the second column ($2)in the array, if it's in there, print it and remove it from the array. By using delete we essentially set a max count of one.

179

answered Sep 20 '22 15:09

Steve

Related questions
                            
                                remove lines from output in bash that contains a huge amount of possibilities
                            
                                How can I quickly find the first line of a file that matches a regex?
                            
                                grep recursion - inconsistencies
                            
                                Primitive but efficient grep clone in haskell?
                            
                                grep through colored text , e.g. gcc | colorgcc | grep regexp
                            
                                Why is "git grep" behaving erratic on my Windows PC?
                            
                                Kill only processes (instances) of specific Java jar
                            
                                Need to delete first N lines of grep result files
                            
                                Can grep show matching part of line with "context"? [duplicate]
                            
                                Perl map/grep memory leak
                            
                                BASH: how to put variable inside regex?
                            
                                Optimise usability of wgrep
                            
                                Using SED to match emails in a sql dump and replace them
                            
                                How to see changed lines with certain words and the containing file for a git commit? - Can git diff print a file name line prefix?
                            
                                How to run grep in parallel on single lines from a list
                            
                                Print only a part of a match with grep
                            
                                Gitlab: piping commands with grep not working?
                            
                                Solaris equivalent of -o option of grep on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With