I'm trying to do my homework that is restricted to only using sed
to filter an input file to a certain format of output. Here is the input file (named stocks
):
Symbol;Name;Volume
================================================
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
================================================
And the output needs to be:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
I did come up with a solution, but it's not efficient. Here is my sed
script (named try.sed
):
/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}
The command that I run on shell is:
$ sed -nf try.sed stocks
My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!
Thanks for any recommendations.
The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in.
Using sed as grep. By default, sed will print every line it is scanning to the standard output stream. To disable this automatic printing, we can use the flag -n. Next, it will run the script that comes after the flag -n and look for the regex pattern ERROR on every line in log.
To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. The terminal prints the name of every file that contains the matching lines, and the actual lines that include the required string of characters. You can append as many filenames as needed.
One more way using sed
:
sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks
Output:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
Explanation:
-ne # Process each input line without printing and execute next commands...
/^====/,/^====/ # For all lines between these...
{
/;/ # If line has a semicolon...
{
s/;.*$// # Remove characters from first semicolon until end of line.
H # Append content to 'hold space'.
}
};
$ # In last input line...
{
g # Copy content of 'hold space' to 'pattern space' to work with it.
s/\n// # Remove first newline character.
s/\n/, /g # substitute the rest with output separator, comma in this case.
p # Print to output.
Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).
sed
, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.
d
, which deletes the pattern space and immediately moves on to the next line.;
) with a comma-and-space (", ") using the s
(substitute) command.H
).That being said, that's just one way to go about it. sed
often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.
As a note, I don't bother suppressing printing (with -n
) or manually printing (with p
); each line is printed by default. My script runs like this:
$ sed -f companies.sed companies
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With