Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write a sed script to grep information from a text file

I'm trying to do my homework that is restricted to only using sed to filter an input file to a certain format of output. Here is the input file (named stocks):

Symbol;Name;Volume
================================================

BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453

================================================

And the output needs to be:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

I did come up with a solution, but it's not efficient. Here is my sed script (named try.sed):

/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}

The command that I run on shell is:

$ sed -nf try.sed stocks

My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!

Thanks for any recommendations.

like image 436
Jaycee Avatar asked Feb 03 '12 18:02

Jaycee


People also ask

How do I grep data from a file?

The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in.

How do you use sed as grep?

Using sed as grep. By default, sed will print every line it is scanning to the standard output stream. To disable this automatic printing, we can use the flag -n. Next, it will run the script that comes after the flag -n and look for the regex pattern ERROR on every line in log.

How do I grep for text containing files?

To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. The terminal prints the name of every file that contains the matching lines, and the actual lines that include the required string of characters. You can append as many filenames as needed.


2 Answers

One more way using sed:

sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks

Output:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

Explanation:

-ne               # Process each input line without printing and execute next commands...
/^====/,/^====/   # For all lines between these...
{
  /;/             # If line has a semicolon...
  { 
    s/;.*$//      # Remove characters from first semicolon until end of line.
    H             # Append content to 'hold space'.
  }
};
$                 # In last input line...
{
  g               # Copy content of 'hold space' to 'pattern space' to work with it.
  s/\n//          # Remove first newline character.
  s/\n/, /g       # substitute the rest with output separator, comma in this case.
  p               # Print to output.
like image 159
Birei Avatar answered Oct 03 '22 00:10

Birei


Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).

sed, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.

  1. Skip the first three lines using d, which deletes the pattern space and immediately moves on to the next line.
  2. For each line that isn't a blank line, do the following steps. (This would all be in a single set of curly braces.)
    1. Replace everything after and including the first semicolon (;) with a comma-and-space (", ") using the s (substitute) command.
    2. Append the current pattern space into the hold buffer (look at H).
    3. Delete the pattern space and move on to the next line, like in step 1.
  3. For each line that gets to this point in the script (should be the first blank line), retrieve the contents of the hold space into the pattern space. (This would be after the curly braces above.)
  4. Substitute all newlines in the pattern space with nothing.
  5. Next, substitute the last comma-and-space in the pattern space with nothing.
  6. Finally, quit the program so you don't process any more lines. My script worked without this, but I'm not 100% sure why.

That being said, that's just one way to go about it. sed often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.

As a note, I don't bother suppressing printing (with -n) or manually printing (with p); each line is printed by default. My script runs like this:

$ sed -f companies.sed companies 
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
like image 28
Dan Fego Avatar answered Oct 03 '22 00:10

Dan Fego