Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep Access Multiple lines, find all words between two patterns

Tags:

sed

awk

line

Need help in scanning text files and find all the words between two patterns. Like say if we have a .sql file, Need to scan and find all words between from' and 'where'. Grep can only scan 1 line at a time. For this requirement what is the best unix script to use? sed, awk has these features? Pointing to any examples is greatly appreciated.

like image 439
user1734741 Avatar asked Oct 16 '12 15:10

user1734741


People also ask

How do I grep from one pattern to another?

The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.

How do you grep multiple lines after a match?

Use the -A argument to grep to specify how many lines beyond the match to output. And use -B n to grep lines before the match. And -C in grep to add lines both above and below the match!

How do you grep 3 lines after a match?

For BSD or GNU grep you can use -B num to set how many lines before the match and -A num for the number of lines after the match. If you want the same number of lines before and after you can use -C num . This will show 3 lines before and 3 lines after.

How do I grep multiple patterns in Linux?

How to Grep Multiple Patterns – Syntax The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.

How to find all words in a file using grep?

To find all of the words (or patterns), you can run grep in a for loop. The main advantage here is searching from a list of regular expressions. # File 'search_all_regex_and_error_if_missing.sh' find_list="\ ^a+$ \ ^b+$ \ ^h+$ \ ^d+$ \ " for item in $find_list; do if grep -E "$item" file_to_search_within.txt then echo "$item found in file."

How do I find the distance between two strings in grep?

If you have some estimation about the distance between the 2 strings 'abc' and 'efg' you are looking for, you might use: That way, the first grep will return the line with the 'abc' plus #num1 lines after it, and #num2 lines after it, and the second grep will sift through all of those to get the 'efg'.

How to use Git grep to combine multiple patterns?

Here is the syntax using git grep combining multiple patterns using Boolean expressions: The above command will print lines matching all the patterns at once. --no-index Search files in the current directory that is not managed by Git. Check man git-grep for help. How to use grep to match string1 AND string2?


3 Answers

Sed has this:

sed -n -e '/from/,/where/ p' file.sql

Prints all the lines between a line with a from and a line with a where.

For something that can include lines that have both from and where:

#!/bin/sed -nf

/from.*where/ {
    s/.*\(from.*where\).*/\1/p
    d
}
/from/ {
    : next
    N
    /where/ {
        s/^[^\n]*\(from.*where\)[^\n]*/\1/p
        d
    }
    $! b next
}

This (written as a sed script) is slightly more complex, and I'll try to explain the details.

The first line is executed on a line that contains both a from and a where. If a line matches that pattern, two commands are executed. We use the s substitute command to extract only the parts between from and where (including the from and where). The p suffix in that command prints the line. The delete command clears the pattern space (the working buffer), loading the next line and restarting the script.

The second command starts to execute a series of commands (grouped by the braces) when a line containing from is found. Basically, the commands form a loop that will keep appending lines from the input into the pattern space until a line with a where is found or until we reach the last line.

The : "command" creates a label, a marker in the script that allows us to "jump" back when we want to. The N command reads a line from the input, and appends it to the pattern space (separating the lines with a newline character).

When a where is found, we can print out the contents of the pattern space, but first we have to clean it with the substitute command. It is analogous to the one used previously, but we now replace the leading and trailing .* with [^\n]*, which tells sed to match only non-newline characters, effectively matching a from in the first line and a where in the last line. The d command then clears the pattern space and restarts the script on the next line.

The b command will jump to a label, in our case, the label next. However, the $! address says it should not be executed on the last line, allowing us to leave the loop. When leaving the loop this way, we haven't found a respective where, so you may not want to print it.

Note however, this has some drawbacks. The following cases won't be handled as expected:

from ... where ... from

from ... from
where

from
where ... where

from
from
where
where

Handling these cases require more code.

Hope this helps =)

like image 102
Janito Vaqueiro Ferreira Filho Avatar answered Nov 16 '22 01:11

Janito Vaqueiro Ferreira Filho


With GNU awk so you can set the RS to an RE:

gawk -v RS='[[:space:]]+' '
   /where/ { found=0 }
   found   {  print  }
   /from/  { found=1 }
' file

The above assumes you do not want the "from" and "where" printed, move the lines around if necessary to do otherwise.

In case it helps, the following idioms describe how to select a range of records given a specific pattern to match:

a) Print all records from some pattern:

awk '/pattern/{f=1}f' file

b) Print all records after some pattern:

awk 'f;/pattern/{f=1}' file

c) Print the Nth record after some pattern:

awk 'c&&!--c;/pattern/{c=N}' file

d) Print every record except the Nth record after some pattern:

awk 'c&&!--c{next}/pattern/{c=N}1' file

e) Print the N records after some pattern:

awk 'c&&c--;/pattern/{c=N}' file

f) Print every record except the N records after some pattern:

awk 'c&&c--{next}/pattern/{c=N}1' file

g) Print the N records from some pattern:

awk '/pattern/{c=N}c&&c--' file

I changed the variable name from "f" for "found" to "c" for "count" where appropriate as that's more expressive of what the variable actually IS.

like image 34
Ed Morton Avatar answered Nov 16 '22 00:11

Ed Morton


You could use ed for this, it allows positive and negative offsets for the regex range. If the input is:

seq 10 | tee > infile
1
2
3
4
5
6
7
8
9
10

Pipe in the command to ed:

<<< /3/,/6/p | ed -s infile

i.e. print everything between lines containing 3 and 6.

Result:

3
4
5
6

To get one more line at each end:

<<< /3/-1,/5/+1p | ed -s infile

Result:

2
3
4
5
6
7

Or the other way around:

<<< /3/+1,/6/-1p | ed -s infile

Result:

4
5
like image 25
Thor Avatar answered Nov 16 '22 01:11

Thor