I am doing some BASH shell scripting with curl. If my curl command returns any text, I know I have an error. This text returned by curl is usually in HTML. I figured that if I can strip out all of the HTML tags, I could display the resulting text as an error message. 
I was thinking of something like this:
sed -E 's/<.*?>//g' <<<$output_text
But I get sed: 1: "s/<.*?>//": RE error: repetition-operator operand invalid
If I replace *? with *, I don't get the error (and I don't get any text either). If I remove the global (g) flag, I get the same error.
This is on Mac OS X.
sed doesn't support non-greedy.
try
's/<[^>]*>//g'
                        Maybe parser-based perl solution?
perl -0777 -MHTML::Strip -nlE 'say HTML::Strip->new->parse($_)' file.html
You must install the HTML::Strip module with cpan HTML::Strip command.
alternatively
you can use an standard OS X  utility called: textutil see the man page
textutil -convert txt file.html
will produce file.txt with stripped html tags, or
textutil -convert txt -stdin -stdout < file.txt | some_command
Another alternative
Some systems get installed the lynx text-only browser. You can use the:
lynx -dump file.html #or
lynx -stdin -dump < file.html
But in your case, you can rely only on pure sed or awk solutions... IMHO.
But, if you have perl (and only haven't the HTML::Strip module) the next is still better as sed
perl -0777 -pe 's/<.*?>//sg'
because will remove the next (multiline and common) tag too:
<a
 href="#"
 class="some"
>link text</a>
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With