I have a long .txt file (LONG.txt
). In that txt file, I want to search for 3 types of patterns and then I want to capture the grep result into a new txt file (SHORT.txt
).
Patterns:
AAAAA
BBBBB
CCCCC
NOTE:
When pattern AAAAA
or BBBBB
is found, I want to print only that line which contain AAAAA
or BBBBB
.
When pattern CCCCC
is found, I want to print that line which contain CCCCC
+ next 1 line.
Example:
LONG.txt
:
bla bla
bla bla
bla bla
something something AAAAA something something
bla bla
bla bla
something something CCCCC something something
bla bla
bla bla
bla bla
bla bla
bla bla
bla bla
something something BBBBB something something
bla bla
bla bla
bla bla
something something AAAAA something something
bla bla
something something AAAAA something something
bla bla
something something BBBBB something something
bla bla
bla bla
bla bla
something something CCCCC something something
bla bla
bla bla
bla bla
Output should be:
something something AAAAA something something
something something CCCCC something something
bla bla
something something BBBBB something something
something something AAAAA something something
something something AAAAA something something
something something BBBBB something something
something something CCCCC something something
bla bla
What I tried is:
grep -B0 "AAAAA" LONG.txt > SHORT.txt
grep -B0 "BBBBB" LONG.txt > SHORT.txt
grep -B1 "CCCCC" LONG.txt > SHORT.txt
But this doesn't give me desired output.
Your code would keep overwriting the file because you used a single arrow.
Use a single arrow the first time and double arrows subsequent times to append to the file.
grep "AAAAA" LONG.txt > SHORT.txt
grep "BBBBB" LONG.txt >> SHORT.txt
grep -A1 "CCCCC" LONG.txt >> SHORT.txt
The first two grep
commands print just the line with the match and the last one prints the line and one line after.
Additional explanation of grep
:
By default it returns just the matching lines. If you pass the -A
flag with a number it will show the matching lines and that number of lines after. E.g. -A1
prints the matching line and the next line as per your request. Similarly, the -B
flag prints lines before the match.
Remember: -A
= After, -B
= Before.
UPDATE
There's the additional requirement that the output retain the order in which they appeared in the original file.
Here's a script to do it:
grep -n "AAAAA" LONG.txt > SHORT.txt
grep -n "BBBBB" LONG.txt >> SHORT.txt
grep -n -A1 "CCCCC" LONG.txt >> SHORT.txt
sort -n -o SHORT.txt SHORT.txt
sed -i 's/^[0-9]\+//' SHORT.txt
sed -i 's/^.//g' SHORT.txt
Only main difference here is that I use the -n
flag in the grep
to print the line numbers then I use sort
to sort the file by these line numbers. The line numbers will still be present in this output file, so you may want to remove those.
awk '/AAA|BBB|CCC/ {print; if ($0 ~ /CCC/) {getline; print;} }'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With