if the information of "XYZ 81.6 (-0.1)" needed to be extracted from one html webpage, how can it be done with perl? Many thanks.
<table border="0" width="100%">
<caption valign="top">
<p class="InfoContent"><b><br></b>
</caption>
<tr>
<td colspan="3"><p class="InfoContent"><b>ABC</b></td>
</tr>
<tr>
<td valign="top" height="61" width="31%">
<p class="InfoContent"><b><font color="#0000FF">XYZ 81.6 (-0.1) <br>22/06/2011</font></b></p>
</td>
</tr></table>
I would use HTML::TreeBuilder::XPath for this (and yes, it is a shameless plug!):
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
my $t= HTML::TreeBuilder::XPath->new_from_file( shift @ARGV);
my $text= $t->findvalue( '//p[@class="InfoContent"]/b/font[@color="#0000FF"]');
$text=~ s{\).*}{)};
print "found '$text'\n";
It is quite fragile though: as far as I can tell the only way to narrow down the XPath expression to just what you want is to use the font tag. That is likely to change in the future, so if (when!) the code breaks, that's where you'll have to look first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With