Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Mechanize find all links within Div

is there a way to find all links within a specific div by using Mechanize?

I tried to use find_all_links but couldn't find a way to get through this. for example,

<div class="sometag">
<ul class"tags">
<li><a href="/a.html">A</a></li>
<li><a href="/b.html">B</a></li> 
</ul>
</div>
like image 911
REALFREE Avatar asked Jan 19 '23 10:01

REALFREE


2 Answers

A useful tool for grabbing useful info out of HTML files is HTML::Grabber. It uses a jQuery style of syntax to reference elements in the HTML, so you might do something like this:

use HTML::Grabber;

# Your mechanize stuff here ...

my $dom = HTML::Grabber->new( html => $mech->content );

my @links;
$dom->find('div.sometag a')->each(sub {
    push @links, $_->attr('href');
});
like image 145
Grant McLean Avatar answered Jan 21 '23 23:01

Grant McLean


Web::Scraper is useful for scraping.

use strict;
use warnings;
use WWW::Mechanize;
use Web::Scraper;

my $mech = WWW::Mechanize->new;
$mech->env_proxy;
# If you want to login, do it with mechanize.

my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' };
# pass mechanize to scraper as useragent.
$staff->user_agent($mech);

my $res = $staff->scrape( URI->new("http://example.com/") );
for my $link (@{$res->{links}}) {
    warn $link;
}

Sorry, I didn't test this code.

like image 39
mattn Avatar answered Jan 21 '23 22:01

mattn