I've got a shell script outputting data like this:
1234567890 * 1234567891 *
I need to remove JUST the last three characters " *". I know I can do it via
(whatever) | sed 's/\(.*\).../\1/'
But I DON'T want to use sed for speed purposes. It will always be the same last 3 characters.
Any quick way of cleaning up the output?
In this method, you have to use the rev command. The rev command is used to reverse the line of string characterwise. Here, the rev command will reverse the string, and then the -c option will remove the first character. After this, the rev command will reverse the string again and you will get your output.
The sed is a free and open-source stream editor for Linux, macOS, *BSD, and Unix-like systems. It is perfect for removing/deleting the last character and perform other options on your files or shell variables.
To remove the last n characters of a string, we can use the parameter expansion syntax ${str::-n} in the Bash shell. -n is the number of characters we need to remove from the end of a string.
Here's an old-fashioned unix trick for removing the last 3 characters from a line that makes no use of sed OR awk...
> echo 987654321 | rev | cut -c 4- | rev 987654
Unlike the earlier example using 'cut', this does not require knowledge of the line length.
I can guarantee you that bash
alone won't be any faster than sed
for this task. Starting up external processes in bash
is a generally bad idea but only if you do it a lot.
So, if you're starting a sed
process for each line of your input, I'd be concerned. But you're not. You only need to start one sed
which will do all the work for you.
You may however find that the following sed
will be a bit faster than your version:
(whatever) | sed 's/...$//'
All this does is remove the last three characters on each line, rather than substituting the whole line with a shorter version of itself. Now maybe more modern RE engines can optimise your command but why take the risk.
To be honest, about the only way I can think of that would be faster would be to hand-craft your own C-based filter program. And the only reason that may be faster than sed
is because you can take advantage of the extra knowledge you have on your processing needs (sed
has to allow for generalised procession so may be slower because of that).
Don't forget the optimisation mantra: "Measure, don't guess!"
If you really want to do this one line at a time in bash
(and I still maintain that it's a bad idea), you can use:
pax> line=123456789abc pax> line2=${line%%???} pax> echo ${line2} 123456789 pax> _
You may also want to investigate whether you actually need a speed improvement. If you process the lines as one big chunk, you'll see that sed
is plenty fast. Type in the following:
#!/usr/bin/bash echo This is a pretty chunky line with three bad characters at the end.XXX >qq1 for i in 4 16 64 256 1024 4096 16384 65536 ; do cat qq1 qq1 >qq2 cat qq2 qq2 >qq1 done head -20000l qq1 >qq2 wc -l qq2 date time sed 's/...$//' qq2 >qq1 date head -3l qq1
and run it. Here's the output on my (not very fast at all) R40 laptop:
pax> ./chk.sh 20000 qq2 Sat Jul 24 13:09:15 WAST 2010 real 0m0.851s user 0m0.781s sys 0m0.050s Sat Jul 24 13:09:16 WAST 2010 This is a pretty chunky line with three bad characters at the end. This is a pretty chunky line with three bad characters at the end. This is a pretty chunky line with three bad characters at the end.
That's 20,000 lines in under a second, pretty good for something that's only done every hour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With