To get third column in tab delimited file, you can simply call it as cut -f3 <file> . Different delimiter can be passed via -d parameter, e.g.: cut -f3 -d: . You can even get multiple columns, or characters at fixed positions on the line.
To look at the first few lines of a file, type head filename, where filename is the name of the file you want to look at, and then press <Enter>. By default, head shows you the first 10 lines of a file.
Use comm -12 file1 file2 to get common lines in both files. You may also needs your file to be sorted to comm to work as expected. Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.
The following will print the line matching TERMINATE
till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n
disables default behavior of sed
of printing each line after executing its script on it, -e
indicated a script to sed
, /TERMINATE/,$
is an address (line) range selection meaning the first line matching the TERMINATE
regular expression (like grep) to the end of the file ($
), and p
is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE
till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/
is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE
regular expression, and d
is the delete command which delete the current line and skip to the next line. As sed
default behavior is to print the lines, it will print the lines after TERMINATE
to the end of input.
If you want the lines before TERMINATE
:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE
in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $
meaning the last line so the shell will not try to expand the $w
variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE
by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
$variablename
) enclosed in single quotes
['
] won't "expand" but variables inside double quotes
["
] will. So, you have to change all the single quotes
to double quotes
if they contain text you want to replace with a variable.sed
ranges also contain a $
and are immediately followed by a letter like: $p
, $d
, $w
. They will also look like variables to be expanded, so you have to escape those $
characters with a backslash [\
] like: \$p
, \$d
, \$w
.As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE
and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
The other solutions might consume a lot of memory if you use them on very large files.
If I understand your question correctly you do want the lines after TERMINATE
, not including the TERMINATE
-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
if(found) print
) will not print anything to start off with.This will print all lines after the TERMINATE
-line.
Generalization:
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
found
is set.found=1
so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.Notes:
BEGIN{found=0}
to the start of the AWK expression.grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Use Bash parameter expansion like the following:
content=$(cat file)
echo "${content#*TERMINATE}"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With