I know my question title is not that descriptive but let me explain here.
I am trying to parse the given html document using HTML::TreeBuilder. Now in this html document values 5,1,ABC,DEF
are to be validated against the user supplied value and if that validation successfull I have to extract href
link.
So, my code is :
my @tag = $tree->look_down( _tag => 'tr', class => qr{\bepeven\scompleted\b} );
for (@tag) {
query_element($_);
}
sub query_element {
my @td_tag = $_[0]->look_down( _tag => 'td' );
my $num1 = shift @td_tag; #Get the first td tag
my $num2 = shift @td_tag; # Get the second td tag
#Making sure first/second td tag has numeric value
$num1 = $1 if $num1->as_text =~ m!(\d+)! or die "no match found";
$num2 = $1 if $num2->as_text =~ m!(\d+)! or die "no match found";
#Validating that above value's match the user provided value 5 and 1.
if ( $num1 eq '5' && $num2 eq '1' ) {
say "hurray..!!";
#Iterating over rest of the td tag to make sure we get the right link from it.
for (@td_tag) {
#Check if contains ABC and than procede to fetch the download href link.
if ($_->look_down(_tag => 'td', class => qr{[c]}, sub {
$_[0]->as_text eq 'ABC';} )
)
{
my $text = $_->as_text;
say "Current node text is: ", $text; #outputs ABC
#Now from here how do I get the link I want to extract.
}
}
}
}
Now, my approach is first extract the value from td tags
and match it against the user specified value if it is a success than look for another user specified value either ABC or DEF
in my case it is ABC
if it matched than only extract the link.
Now, tag containig ABC or DEF
has no fixed position but they will be below the tags containing 5 and 1
value. So, I used $_[0]->as_text eq 'ABC';
to chech that the tag contains ABC
now in my tree I am currently at text node
ABC from here how do I extract the link href i,e how do I move up the object tree and extract value.
PS: I would have tried xpath here but position of html elements is not that well-defined and structured.
EDIT:
So, I tried $_->tag()
and returned td
but if I am on td tag than the why the following code doesn't work:
my $link_obj = $_->look_down(_tag => 'a') # It should look for `a` tag.
say $link_obj->as_text;
But it gives the following error:
Can't call method "as_text" on an undefined value.
I hope the following (using my own Marpa::R2::HTML) is helpful. Note that the HTML::TreeBuilder answer finds only one answer. The code below finds two, which I think was the intention.
#!perl
use Marpa::R2::HTML qw(html);
use 5.010;
use strict;
use warnings;
my $answer = html(
( \join q{}, <DATA> ),
{ td => sub { return Marpa::R2::HTML::contents() },
a => sub {
my $href = Marpa::R2::HTML::attributes()->{href};
return undef if not defined $href;
return [ link => $href ];
},
'td.c' => sub {
my @values = @{ Marpa::R2::HTML::values() };
if ( ref $values[0] eq 'ARRAY' ) { return $values[0] }
return [ test => 'OK' ] if Marpa::R2::HTML::contents eq 'ABC';
return [ test => 'OK' ] if Marpa::R2::HTML::contents eq 'DEF';
return [ test => '' ];
},
tr => sub {
my @cells = @{ Marpa::R2::HTML::values() };
return undef if shift @cells != 5;
return undef if shift @cells != 1;
my $ok = 0;
my $link;
for my $cell (@cells) {
my ( $type, $value ) = @{$cell};
$ok = 1 if $type eq 'test' and $value eq 'OK';
$link = $value if $type eq 'link';
}
return $link if $ok;
return undef;
},
':TOP' => sub { return Marpa::R2::HTML::values(); }
}
);
die "No parse" if not defined $answer;
say join "\n", @{$answer};
__DATA__
<table>
<tbody>
<tr class="epeven completed">
<td>5</td>
<td>1</td>
<td class="c">ABC</td>
<td class="c">satus</td>
<td class="c"><a href="/path/link">Download</a></td>
</tr>
<tr class="epeven completed">
<td>5</td>
<td>1</td>
<td class="c">status</td>
<td class="c">DEF</td>
<td class="c"><a href="/path2/link">Download</a></td>
</tr>
</table>
I'm not certain I understand what you're looking to do, but something along these lines? Use look_down to describe what you want, there's no need to try navigating yourself around the tree; that's going to be fragile.
use strict;
use warnings;
use HTML::TreeBuilder 5 -weak;
use 5.014;
my $tree = HTML::TreeBuilder->new_from_content(<DATA>);
for my $e ($tree->look_down( _tag => 'a',
sub { my $e = $_[0];
my $tr = $e->parent->parent; ### Could also use ->lineage to search up through the
### containing elements
return unless $tr->attr('_tag') eq 'tr' and $tr->attr('class') eq 'epeven completed';
return ( $tr->look_down( _tag => 'td', sub { $_[0]->as_text eq '1'; })
and $tr->look_down( _tag => 'td', sub { $_[0]->as_text eq '5'; })
and $tr->look_down( _tag => 'td', class => 'c', sub { $_[0]->as_text eq 'ABC'; })
);
}
)
) {
say $e->attr('href');
}
__DATA__
<table>
<tbody>
<tr class="epeven completed">
<td>5</td>
<td>1</td>
<td class="c">ABC</td>
<td class="c">satus</td>
<td class="c"><a href="/path/link">Download</a></td>
</tr>
<tr class="epeven completed">
<td>5</td>
<td>1</td>
<td class="c">status</td>
<td class="c">DEF</td>
<td class="c"><a href="/path2/link">Download</a></td>
</tr>
</table>
Output:
/path/link
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With