I am trying to parse an HTML document with awk.
The document contains several <div class="p_header_bottom"></div blocks
<div class="p_header_bottom">
<span class="fl_r"></span>
287,489 people
</div>
<div class="p_header_bottom">
<span class="fl_r"></span>
5 links
</div>
I am using
awk '/<div class="p_header_bottom">/,/<\/div>/'
to receive all such div's.
How I can get 287,489 number from first one?
Actually awk '/<\/span>/,/people/' doesn't work correctly.
With gawk, and assuming that the only digits and commas within each <div> </div> block occur in the numeric portion of interest
awk -v RS='<[/]?div[^>]*>' '/span/ && /people/{gsub(/[^[:digit:],]/, ""); print}' file.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With