Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stripping blocks of text from huge text file

Tags:

sed

awk

I've been tasked with something quite painful and I was wondering if anyone could help.

Our vendor has provided an SNMP mib file (txt). Unfortunately, an awful lot of this file is outdated and needs to be stripped out for our monitoring app.

I've been trying to do this by hand, but it's over 800,000 lines long, and it's sapping my will to live.

The structure is something like:

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /* Deprecated from: 600.3                                                        */
-- /*********************************************************************************/

Some text 
some text 
Some text

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/

Some text 
some text 
Some text

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 364.44                                                            */
-- /* Deprecated from: 594.3                                                        */
-- /*********************************************************************************/

Repeated at random and ad nauseum

What I'm thinking, is a script that would:

find the text "Deprecated from" then

delete that line, 
delete the preceding 3 lines, 
delete the following one line, 
delete then all following lines until the next
"-- /*********************************************************************************/"

Does this make sense? Is this kind of thing possible, or am I only dreaming?

Thankyou!

like image 572
Laptopgrrl Avatar asked Feb 01 '12 00:02

Laptopgrrl


1 Answers

Edit: I just realized I read your question wrong, even after having been upvoted a few times. My response before was off! It should now be more correct, but with some additional assumptions. Simple solutions can only get you so far!

This might be able to help you out, with a few assumptions:

cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } 1'

The cat command is simply there to squeeze out excess newlines, so awk can operate more easily. As for awk, the -vFS='\n' tells it that fields are separated by newlines, and -vRS='\n\n' tells it that records are separated by two newlines in a row. Then /Deprecated from/ finds records that have that text, and { getline; next } reads in the next record after it, and causes it to move on. 1 is a shortcut to print lines that reach the following point.

This will assume the following:

  • All comment and text blocks are separated by at least one blank line on either side
  • There are only comment blocks and text blocks interspersed evenly
  • There aren't blank lines within the text blocks

So it might not be quite perfect for you. If these assumptions are okay, it makes awk a nice choice for this job, as you can see: the script is tiny!

$ cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } 1'
-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/
Some text
some text
Some text

In addition, as you can see, the newlines that remain get pushed out. To aid this, you could modify the command like this:

$ cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } { printf "%s\n\n", $0 }'
-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/

Some text
some text
Some text
like image 166
Dan Fego Avatar answered Nov 15 '22 10:11

Dan Fego