I've been playing around with a little shell script to get some info out of a HTML page downloaded with lynx.
My problem is that I get this string: <span class="val3">MPPTN: 0.9384</span></td>
I can trim the first part of that using:
trimmed_info=`echo ${info/'<span class="val3">'/}`
And the string becomes: "MPPTN: 0.9384"
But how can I trim the last part? Seem like the "/" is messing up with the echo command... I tried:
echo ${finalt/'</span></td>'/};
Not sure if using sed is ok -- one way to extract out the number could be something like ...
echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//'
The behavior of ${VARIABLE/PATTERN/REPLACEMENT}
depends on what shell you're using, and for bash what version. Under ksh, or under recent enough (I think ≥ 4.0) versions of bash, ${finalt/'</span></td>'/}
strips that substring as desired. Under older versions of bash, the quoting is rather quirky; you need to write ${finalt/<\/span><\/td>/}
(which still works in newer versions).
Since you're stripping a suffix, you can use the ${VARIABLE%PATTERN}
or ${VARIABLE%%PATTERN}
construct instead. Here, you're removing everything after the first </
, i.e. the longest suffix that matches the pattern </*
. Similarly, you can strip the leading HTML tags with ${VARIABLE##PATTERN}
.
trimmed=${finalt%%</*}; trimmed=${trimmed##*>}
Added benefit: unlike ${…/…/…}
, which is specific to bash/ksh/zsh and works slightly differently in all three, ${…#…}
and ${…%…}
are fully portable. They don't do as much, but here they're sufficient.
Side note: although it didn't cause any problem in this particular instance, you should always put double quotes around variable substitutions, e.g.
echo "${finalt/'</span></td>'/}"
Otherwise the shell will expand wildcards and spaces in the result. The simple rule is that if you don't have a good reason to leave the double quotes out, you put them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With