how to extract specific information from html webpage using perl

Question

if the information of "XYZ 81.6 (-0.1)" needed to be extracted from one html webpage, how can it be done with perl? Many thanks.

<table border="0" width="100%">
          <caption valign="top">
            <p class="InfoContent"><b><br></b>
          </caption>
          <tr>
            <td colspan="3"><p class="InfoContent"><b>ABC</b></td>
          </tr>
          <tr>
            <td valign="top" height="61" width="31%">
              <p class="InfoContent"><b><font color="#0000FF">XYZ 81.6 (-0.1)&nbsp;<br>22/06/2011</font></b></p>
            </td>
          </tr></table>

mirod · Accepted Answer

I would use HTML::TreeBuilder::XPath for this (and yes, it is a shameless plug!):

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder::XPath;

my $t= HTML::TreeBuilder::XPath->new_from_file( shift @ARGV);

my $text= $t->findvalue( '//p[@class="InfoContent"]/b/font[@color="#0000FF"]');

$text=~ s{\).*}{)};

print "found '$text'
";

It is quite fragile though: as far as I can tell the only way to narrow down the XPath expression to just what you want is to use the font tag. That is likely to change in the future, so if (when!) the code breaks, that's where you'll have to look first.

how to extract specific information from html webpage using perl

Tags:

html-parsing

perl

leon

1 Answers

mirod

Recent Activity

Donate For Us

how to extract specific information from html webpage using perl

Tags:

html-parsing

perl

leon

1 Answers

mirod

Related questions

Recent Activity

Donate For Us